bobmerevel commented on issue #16838:
URL: https://github.com/apache/iceberg/issues/16838#issuecomment-4722667824

   After investigating this further, I think there is an opportunity to improve 
the failure mode in ExpireSnapshotsSparkAction.
   
   In my opinion, the action could validate metadata.metadataFileLocation() 
before performing the commit. This would prevent a successful metadata commit 
followed by a failure during expired file calculation due to an invalid 
TableMetadata.
   
   For example, immediately after:
   
   TableMetadata originalMetadata = ops.current();
   
   the action could verify that originalMetadata.metadataFileLocation() is not 
null and fail with a descriptive error message if it is.
   
   Similarly, after:
   
   TableMetadata updatedMetadata = ops.refresh();
   
   the refreshed metadata could be validated before it is used to construct 
static tables.
   
   Additionally, it may be beneficial to add a defensive validation in 
BaseSparkAction.newStaticTable() itself, since it currently assumes that 
metadata.metadataFileLocation() is always present:
   
   protected Table newStaticTable(TableMetadata metadata, FileIO io) {
       StaticTableOperations ops = new StaticTableOperations(metadata, io);
       return new BaseTable(ops, metadata.metadataFileLocation());
   }
   
   At the moment, if metadataFileLocation() is null, the code eventually fails 
with a NullPointerException, which makes the real root cause difficult to 
identify.
   
   A fail-fast validation with an explicit error message such as:
   
   “Table metadata file location is null. This may indicate an invalid catalog 
implementation or an invalid REST Catalog response missing the 
metadata-location field.”
   
   would make debugging much easier for catalog implementers.
   
   This would not change the behavior of valid implementations, but it would 
provide a much clearer diagnostic when the catalog contract is violated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to