kbendick commented on issue #4346:
URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1083460470


   Agreed that making the source of the "actual files" list pluggable is 
orthogonal.
   
   I would propose, since I know that it's mostly working and that it's rather 
simple, that we consider the addition of a way to add in a source other than 
the hadoop list as an additional option. Right now, it's simply another table 
that the can be referenced that contains the actual files of the file store.
   
   Things like prefix normalization would be applied to the listing of files in 
the table, which would be outside the scope of providing a list of actual files.
   
   For example for users on s3, normalization of the files in the table on the 
scheme off `s3a` or `s3` is probably the most common concern. That would be 
necessary to apply to the table's own files, where the scheme can be either 
`s3` or `s3a`. But the provided list of actual files would simply have one or 
the other.
   
   We can open a PR for review to better show what is meant. But I don't think 
that the normalization work needs to be completed before we make it pluggable 
in this way.
   
   It will be more clear what is meant by putting up the work, but it is in 
fact a rather small change that provides a very significant benefit (avoiding 
the listing of the file store).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to