[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #3616: [feat] Support using offset index in ParquetRecordBatchStream when pu…

GitBox Wed, 28 Sep 2022 00:36:34 -0700


tustvold commented on code in PR #3616:
URL: https://github.com/apache/arrow-datafusion/pull/3616#discussion_r982047518



##########
datafusion/core/src/physical_plan/file_format/parquet.rs:
##########
@@ -78,6 +79,10 @@ pub struct ParquetScanOptions {
     /// If true, the generated `RowFilter` may reorder the predicate `Expr`s 
to try and optimize
     /// the cost of filter evaluation.
     reorder_predicates: bool,
+    /// If true, the reader will read pageIndex, If exit, first we can use it 
create the `RowSelector`
+    /// before read the file, Second with pageIndex it will accelerate skip 
records (avoid decode pageHeader)
+    /// when reading values from chunk with `RowSelector`.

Review Comment:
   ```suggestion
       /// If enabled, the reader will read the page index
       /// This is used to optimise filter pushdown
       /// via `RowSelector` and `RowFilter` by
       /// eliminating unnecessary IO and decoding
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #3616: [feat] Support using offset index in ParquetRecordBatchStream when pu…

Reply via email to