rdblue commented on pull request #2220: URL: https://github.com/apache/iceberg/pull/2220#issuecomment-774378949
> The current default (unbound file sizes) will never take advantage of any predicate push down I'm not sure I understand what you're saying here. Why would this prevent predicate pushdown? Large files with unordered data may have larger and larger ranges, but the happens quickly in even a single row group. To get effective file pruning, you need to cluster data by filter columns. If you're doing that, then I would say that larger files _diminish_ the benefit of pusdown, but don't preclude it. And, parallelism concerns typically force people to create small files because writes are faster that way. I'm not sure this is needed, although I don't really have a problem with adding it. I'd also like to hear what Anton thinks. @szehon-ho, good to see you in this community! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
