Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

via GitHub Wed, 03 Jan 2024 04:16:29 -0800


huberylee commented on code in PR #39393:
URL: https://github.com/apache/arrow/pull/39393#discussion_r1440385297



##########
cpp/src/parquet/column_reader.cc:
##########
@@ -1370,6 +1402,54 @@ class TypedRecordReader : public 
TypedColumnReaderImpl<DType>,
     return bytes_for_values;
   }
 
+  // Two parts different from original HasNextInternal:

Review Comment:
   > Yeah, I mean it limits the usage of `TypedColumnReader`, and only allow 
internal skip. External skip would introduce inconsistent skipping. In 
current-case, `Skip(skip_last)` would skip more than `skip_last`.
   
   First of all, it is strange to execute ``SkipRecords`` on the basis of hit 
lines. Secondly, whether it is consistent depends on how to understand the 
semantics of ``SkipRecords``. If some lines are skipped on the basis of hit 
lines, the current implementation can theoretically guarantee consistency, but 
more tests need to be added for verification; If ``SkipRecords`` is for all 
rows in page, then the existing implementation will indeed have problems.



##########
cpp/src/parquet/column_reader.cc:
##########
@@ -1370,6 +1402,54 @@ class TypedRecordReader : public 
TypedColumnReaderImpl<DType>,
     return bytes_for_values;
   }
 
+  // Two parts different from original HasNextInternal:

Review Comment:
   First of all, it is strange to execute ``SkipRecords`` on the basis of hit 
lines. Secondly, whether it is consistent depends on how to understand the 
semantics of ``SkipRecords``. If some lines are skipped on the basis of hit 
lines, the current implementation can theoretically guarantee consistency, but 
more tests need to be added for verification; If ``SkipRecords`` is for all 
rows in page, then the existing implementation will indeed have problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

Reply via email to