rdblue commented on pull request #2220:
URL: https://github.com/apache/iceberg/pull/2220#issuecomment-774378949


   > The current default (unbound file sizes) will never take advantage of any 
predicate push down
   
   I'm not sure I understand what you're saying here. Why would this prevent 
predicate pushdown? Large files with unordered data may have larger and larger 
ranges, but the happens quickly in even a single row group. To get effective 
file pruning, you need to cluster data by filter columns. If you're doing that, 
then I would say that larger files _diminish_ the benefit of pusdown, but don't 
preclude it. And, parallelism concerns typically force people to create small 
files because writes are faster that way.
   
   I'm not sure this is needed, although I don't really have a problem with 
adding it. I'd also like to hear what Anton thinks.
   
   @szehon-ho, good to see you in this community!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to