aokolnychyi edited a comment on issue #4346: URL: https://github.com/apache/iceberg/issues/4346#issuecomment-1083412195
@kbendick, I think being able to customize `DeleteOrphanFiles` with a custom way of obtaining actual files is a great feature but I find it orthogonal. No matter what way we use to get actual files, we still need to normalize the locations and decide what to do if the scheme/authority don't match. I'd consider exposing some sort of a strategy in `DeleteOrphanFiles` that would allow users to customize not only how to obtain actual files but also other things (e.g. how to perform normalization). For example, I find it reasonable to use Hadoop for normalization if we use Hadoop for listing. If we rely on another way of computing actual files, maybe we should use something else for normalizing. I think once we know what to do for the current implementation we can think of a way to make it pluggable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
