jordepic commented on PR #2615: URL: https://github.com/apache/iceberg-rust/pull/2615#issuecomment-4679951054
Thanks Matt! I just encountered this issue yesterday from comet. Using the midpoint for row group -> split assignment is what is done in the parquet-java library. For context, the FileScanTask API allows providing custom split ranges - however this doesn't really make sense in scenarios with huge parquet row groups where many of the ranges intercept one row group. We'll read back duplicate data, which is bad! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
