kbendick commented on issue #4346: URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1083460470
Agreed that making the source of the "actual files" list pluggable is orthogonal. I would propose, since I know that it's mostly working and that it's rather simple, that we consider the addition of a way to add in a source other than the hadoop list as an additional option. Right now, it's simply another table that the can be referenced that contains the actual files of the file store. Things like prefix normalization would be applied to the listing of files in the table, which would be outside the scope of providing a list of actual files. For example for users on s3, normalization of the files in the table on the scheme off `s3a` or `s3` is probably the most common concern. That would be necessary to apply to the table's own files, where the scheme can be either `s3` or `s3a`. But the provided list of actual files would simply have one or the other. We can open a PR for review to better show what is meant. But I don't think that the normalization work needs to be completed before we make it pluggable in this way. It will be more clear what is meant by putting up the work, but it is in fact a rather small change that provides a very significant benefit (avoiding the listing of the file store). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
