kbendick edited a comment on issue #4346:
URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1083460470


   Agreed that making the source of the "actual files" list pluggable is 
orthogonal. My apologies for bringing it up here, as it's more related to just 
"making `DeleteOrpahFiles` more reliable" by avoiding the list operation on the 
entire object store.
   
   I would propose, since I know that it's mostly working and that it's rather 
simple, that we consider the addition of a way to add in a source other than 
the hadoop-based list as an additional option. Right now, it's simply another 
table that can be referenced that contains the actual files of the file store.
   
   Things like prefix normalization would be applied to the listing of files in 
the table, which would be outside the scope of providing a list of actual files.
   
   For example for users on s3, normalization of the files in the table on the 
scheme of `s3a` or `s3` is probably the most common concern. That would be 
necessary to apply to the table's own files, where the scheme can be either 
`s3` or `s3a` per the manifest list. But the provided list of actual files 
would simply have one or the other.
   
   We can open a PR for review to better show what is meant. But I don't think 
that the normalization work needs to be completed before we make it pluggable 
in this way.
   
   It will be more clear what is meant by putting up the work, but it is in 
fact a rather small change that provides a very significant benefit (avoiding 
the listing of the entire file store if a more definitive source of truth is 
available).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to