huberylee commented on code in PR #39393:
URL: https://github.com/apache/arrow/pull/39393#discussion_r1440369848


##########
cpp/src/parquet/column_reader.cc:
##########
@@ -1370,6 +1402,54 @@ class TypedRecordReader : public 
TypedColumnReaderImpl<DType>,
     return bytes_for_values;
   }
 
+  // Two parts different from original HasNextInternal:

Review Comment:
   - Within a page, the hit rows may not be contiguous. For a page that has 
already been hit, any non-hit rows within it need to be skipped using 
``SkipRecords``.  It may be clearer to look at this logic in conjunction with 
code comments ``column_reader.h:164~173``:
   ```c++
   //
   //   | <--------------------------- column chunk 
-------------------------------> |
   //             | <-------------------- page N -----------------------> |
   //        first_row_idx                                          last_row_idx
   //   |-- ... --|-------------------------------------------------------|--- 
... ---|
   //                       |---- range0 ----|         |---- range1 ----|
   //             |--skip0--|                |--skip1--|
   //             |------last_row_index0-----|
   //             |-------------------last_row_index1-------------------|
   //
   ```
   - During the execution of the ``SkipRecords`` method, it does indeed call 
the ``HasNextInternal`` method. The current implementation and testing have 
covered such scenarios, ensuring that no errors occur.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to