Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

via GitHub Tue, 09 Jan 2024 05:02:08 -0800


huberylee commented on code in PR #39393:
URL: https://github.com/apache/arrow/pull/39393#discussion_r1446065192



##########
cpp/src/parquet/arrow/reader.cc:
##########
@@ -973,10 +998,15 @@ Status GetReader(const SchemaField& field, const 
std::shared_ptr<ReaderContext>&
 
 Status FileReaderImpl::GetRecordBatchReader(const std::vector<int>& row_groups,
                                             const std::vector<int>& 
column_indices,
+                                            const RowRangesOpt& row_ranges,
                                             
std::unique_ptr<RecordBatchReader>* out) {
   RETURN_NOT_OK(BoundsCheck(row_groups, column_indices));
 
-  if (reader_properties_.pre_buffer()) {
+  // When row_ranges has value, only the data of hit pages should be load,

Review Comment:
   > If the IO coalescing process is predictable, we can still enable prebuffer 
here with some code change. If we need to implement this, some logic of 
ChunkBufferedInputStream should be refactored out to be shared by prebuffer.
   
   Yes. In the current implementation, the use of prebuffer is avoided 
regardless of whether pre_buffer is enabled or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-39392: [C++][Parquet] Support page pruning [arrow]

Reply via email to