szehon-ho edited a comment on issue #4346:
URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1076850852


   Yea actually the idea is same as your original, just that:
   - rename the flag
   - add the error mode (for information to user that they were going to delete 
valid files)
   - combine the authority/scheme into 'prefix'
   
   Maybe ```delete``` => ```force-delete``` to make it clear they should not do 
this unless they are sure.
   
   For the algorithm, yea that works.  If we want to further optimize we could 
even conditionally skip the prefix-less comparison for non-error mode, like in 
original algorithm.
   
   I'm open if error mode proves too cumbersome to be useful.  Initialy I was 
thinking its a safety, that they must turn off if trying to delete on a 
different absolute location than what the files were written with.  Users could 
go back to running RemoveOrphan with default 'error' mode once table is fixed 
with all locations on the new prefix.  Maybe it can be via RepairManifests with 
an option to rewrite the prefix, or once relative path is there we can change 
the root location.
   
   Yea, I didn't initially think to distinguish scheme/authority, with just 
prefix, which is different if either are different.  When bucket is different, 
we should throw exception right?  (user tries to clean a different bucket than 
the one the table initially wrote to).  Though I can see for the other case  , 
if the scheme is different (s3 => s3a), it's debatable.  I was thinking to 
avoid too many details in the config, just have them set 
prefix-mismatch-mode='ignore' in this case, but we could put another flag if we 
really need.  Again, the user could eventually fix s3 to s3a in the paths, 
using some of these to-be-developed features.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to