jordepic commented on PR #2615:
URL: https://github.com/apache/iceberg-rust/pull/2615#issuecomment-4679951054

   Thanks Matt! I just encountered this issue yesterday from comet. Using the 
midpoint for row group -> split assignment is what is done in the parquet-java 
library.
   
   For context, the FileScanTask API allows providing custom split ranges - 
however this doesn't really make sense in scenarios with huge parquet row 
groups where many of the ranges intercept one row group. We'll read back 
duplicate data, which is bad!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to