ZENOTME commented on issue #1604: URL: https://github.com/apache/iceberg-rust/issues/1604#issuecomment-3206448551
Thanks for the suggestions from @Fokko and @liurenjie1024, I learned a lot! > That's true, but not always the case. For exameple, Spark leverages distributed planning. Each manifest file has a target size of 8MB, which could be dispatched for distributed planning. It's close to my initial thought for the distributed case. Wee can push the row group pruning down to the executors, and at the same time cache the metadata they read there, so we don’t have to pay the cost of opening the data file twice. But number of data files is much more than manifest files so the cache size need to be consideration. This does make things more complex and the benefit is still uncertain. Let’s stick to the size-based planning for now. I'd like to implement it and bench it again to see what happen. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org