bobmerevel opened a new issue, #16838:
URL: https://github.com/apache/iceberg/issues/16838
### Apache Iceberg version
1.9.2
### Query engine
Spark
### Please describe the bug 🐞
Iceberg Version
* Apache Iceberg: 1.9.2
* Spark Runtime: iceberg-spark-runtime-3.5
* Spark Version: 3.5.7
Description
While implementing a custom Iceberg REST Catalog for our product, I
encountered a NullPointerException during snapshot expiration.
The root cause turned out to be that my REST Catalog accidentally returned:
{
"metadata_location": "..."
}
instead of the expected REST field:
{
"metadata-location": "..."
}
Because of this typo, TableMetadata.metadataFileLocation() becomes null.
Interestingly, normal table reads continue to work correctly because Iceberg
is still able to discover and load the current metadata through the table
location.
However, after a successful remove-snapshots commit,
ExpireSnapshotsSparkAction reloads the table metadata from the catalog and
later constructs a static table using:
protected Table newStaticTable(TableMetadata metadata, FileIO io) {
StaticTableOperations ops = new StaticTableOperations(metadata, io);
return new BaseTable(ops, metadata.metadataFileLocation());
}
Since metadata.metadataFileLocation() is null, the resulting BaseTable is
created with a null metadata location.
Later, during file expiration (fileDS()), this causes an unexpected
NullPointerException inside the Spark job execution.
Relevant code path
private Dataset<FileInfo> fileDS(TableMetadata metadata, Set<Long>
snapshotIds) {
Table staticTable = this.newStaticTable(metadata, this.table.io());
return this.contentFileDS(staticTable, snapshotIds)
.union(this.manifestDS(staticTable, snapshotIds))
.union(this.manifestListDS(staticTable, snapshotIds))
.union(this.statisticsFileDS(staticTable, snapshotIds));
}
which eventually calls:
protected Table newStaticTable(TableMetadata metadata, FileIO io) {
StaticTableOperations ops = new StaticTableOperations(metadata, io);
return new BaseTable(ops, metadata.metadataFileLocation());
}
Expected behavior
If metadata.metadataFileLocation() is required for
ExpireSnapshotsSparkAction, Iceberg should fail fast with a descriptive error
such as:
Table metadata file location is null.
This may indicate an invalid REST Catalog response or missing
metadata-location field.
instead of continuing and eventually failing with an unrelated
NullPointerException.
Actual behavior
The action proceeds until later stages and eventually fails with a
NullPointerException, making the root cause difficult to identify.
Root cause
In my case, the issue was an incorrect REST Catalog response field name
(metadata_location instead of metadata-location).
After fixing the REST response to return the correct metadata-location
property, the issue disappeared completely.
Suggestion
It would be helpful to add an explicit null check around
metadata.metadataFileLocation() in BaseSparkAction.newStaticTable() (or
earlier) and throw a meaningful exception explaining that the REST Catalog
returned an invalid or incomplete table metadata response.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]