ChristinaTech commented on issue #7151:
URL: https://github.com/apache/iceberg/issues/7151#issuecomment-1494931953

   @ryanyuan @c0d3monk So, if the issue actually occurs the way to recover the 
table is stated in the overview via a manual Glue `UpdateTable` API call to set 
the `metadata_location` property to equal the `previous_metadata_location`. But 
that's not so much as a workaround as recovery once it happens.
   
   Considering the missing metadata failure condition is pretty easy to detect 
in code once it happens via catching `NotFoundException`, it would technically 
be possible to automate this fix and then retry the job, though you would have 
to be careful not to change anything else in the Glue Table metadata in the 
process to be safe.
   
   As for a workaround that avoids ending up in this situation in the first 
place, some potential options besides my pending upstream fix are to:
   1. Set the Catalog Option 
[`s3.delete-enabled`](https://iceberg.apache.org/docs/1.2.0/aws/#s3-tags) to 
`false` so the step that actually corrupts the table becomes a no-op. If you do 
this though, you will Orphan any files you attempt to the system attempts to 
delete/expire, so make sure you don't have that option set in whatever context 
you use to 
[DeleteOrphanFiles](https://iceberg.apache.org/docs/1.2.0/maintenance/#delete-orphan-files).
   2. If you want to use the prior option but limit what gets Orphaned, you can 
potentially extend 
[S3FileIO](https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java)
 with a version where `deleteFile` no-ops for metadata files, then specify that 
modified version of S3FileIO for your `io-impl` parameter.
   1. Provide a [custom AWS Client 
factory](https://iceberg.apache.org/docs/1.2.0/aws/#aws-client-customization) 
that disables API retries for the Glue API client. The downside, as documented 
earlier, is:
   > This hurts reliability in normal usage to an unacceptable degree.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to