bobmerevel opened a new issue, #16838:
URL: https://github.com/apache/iceberg/issues/16838

   ### Apache Iceberg version
   
   1.9.2
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Iceberg Version
   
   * Apache Iceberg: 1.9.2
   * Spark Runtime: iceberg-spark-runtime-3.5
   * Spark Version: 3.5.7
   
   Description
   
   While implementing a custom Iceberg REST Catalog for our product, I 
encountered a NullPointerException during snapshot expiration.
   
   The root cause turned out to be that my REST Catalog accidentally returned:
   
   {
     "metadata_location": "..."
   }
   
   instead of the expected REST field:
   
   {
     "metadata-location": "..."
   }
   
   Because of this typo, TableMetadata.metadataFileLocation() becomes null.
   
   Interestingly, normal table reads continue to work correctly because Iceberg 
is still able to discover and load the current metadata through the table 
location.
   
   However, after a successful remove-snapshots commit, 
ExpireSnapshotsSparkAction reloads the table metadata from the catalog and 
later constructs a static table using:
   
   protected Table newStaticTable(TableMetadata metadata, FileIO io) {
       StaticTableOperations ops = new StaticTableOperations(metadata, io);
       return new BaseTable(ops, metadata.metadataFileLocation());
   }
   
   Since metadata.metadataFileLocation() is null, the resulting BaseTable is 
created with a null metadata location.
   
   Later, during file expiration (fileDS()), this causes an unexpected 
NullPointerException inside the Spark job execution.
   
   Relevant code path
   
   private Dataset<FileInfo> fileDS(TableMetadata metadata, Set<Long> 
snapshotIds) {
       Table staticTable = this.newStaticTable(metadata, this.table.io());
       return this.contentFileDS(staticTable, snapshotIds)
           .union(this.manifestDS(staticTable, snapshotIds))
           .union(this.manifestListDS(staticTable, snapshotIds))
           .union(this.statisticsFileDS(staticTable, snapshotIds));
   }
   
   which eventually calls:
   
   protected Table newStaticTable(TableMetadata metadata, FileIO io) {
       StaticTableOperations ops = new StaticTableOperations(metadata, io);
       return new BaseTable(ops, metadata.metadataFileLocation());
   }
   
   Expected behavior
   
   If metadata.metadataFileLocation() is required for 
ExpireSnapshotsSparkAction, Iceberg should fail fast with a descriptive error 
such as:
   
   Table metadata file location is null.
   This may indicate an invalid REST Catalog response or missing 
metadata-location field.
   
   instead of continuing and eventually failing with an unrelated 
NullPointerException.
   
   Actual behavior
   
   The action proceeds until later stages and eventually fails with a 
NullPointerException, making the root cause difficult to identify.
   
   Root cause
   
   In my case, the issue was an incorrect REST Catalog response field name 
(metadata_location instead of metadata-location).
   
   After fixing the REST response to return the correct metadata-location 
property, the issue disappeared completely.
   
   Suggestion
   
   It would be helpful to add an explicit null check around 
metadata.metadataFileLocation() in BaseSparkAction.newStaticTable() (or 
earlier) and throw a meaningful exception explaining that the REST Catalog 
returned an invalid or incomplete table metadata response.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to