huberylee commented on code in PR #39393:
URL: https://github.com/apache/arrow/pull/39393#discussion_r1440369848
##########
cpp/src/parquet/column_reader.cc:
##########
@@ -1370,6 +1402,54 @@ class TypedRecordReader : public
TypedColumnReaderImpl<DType>,
return bytes_for_values;
}
+ // Two parts different from original HasNextInternal:
Review Comment:
- Within a page, the hit rows may not be contiguous. For a page that has
already been hit, any non-hit rows within it need to be skipped using
``SkipRecords``. It may be clearer to look at this logic in conjunction with
code comments ``column_reader.h:164~173``:
```
//
// | <--------------------------- column chunk
-------------------------------> |
// | <-------------------- page N -----------------------> |
// first_row_idx
last_row_idx
// |-- ...
--|-------------------------------------------------------|--- ... ---|
// |---- range0 ----| |---- range1 ----|
// |--skip0--| |--skip1--|
// |------last_row_index0-----|
// |-------------------last_row_index1-------------------|
//
```
- During the execution of the ``SkipRecords`` method, it does indeed call
the ``HasNextInternal`` method. The current implementation and testing have
covered such scenarios, ensuring that no errors occur.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]