advancedxy commented on code in PR #2615:
URL: https://github.com/apache/iceberg-rust/pull/2615#discussion_r3393816511
##########
crates/iceberg/src/arrow/reader/row_filter.rs:
##########
@@ -160,8 +160,14 @@ impl ArrowReader {
/// Filters row groups by byte range to support Iceberg's file splitting.
///
- /// Iceberg splits large files at row group boundaries, so we only read
row groups
- /// whose byte ranges overlap with [start, start+length).
+ /// External engines (e.g. Spark via Comet) split a data file into
multiple scan tasks,
Review Comment:
I think the comment could be updated to reflect the fact: at most(normal)
cases the iceberg parquet files are split at row group boundaries. It only
split parquet files at request size if the splitOffsets metadata is missing
when planning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]