[
https://issues.apache.org/jira/browse/ARROW-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415269#comment-17415269
]
Weston Pace commented on ARROW-13998:
-------------------------------------
When testing / developing this feature keep in mind that page statistics are
optional and some older versions of Arrow (e.g. ARROW-13998) may not be writing
them. So just make sure that your test data has page statistics before you get
too far.
> [C++] Add page skipping to parquet reading
> ------------------------------------------
>
> Key: ARROW-13998
> URL: https://issues.apache.org/jira/browse/ARROW-13998
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Jonathan Keane
> Priority: Major
>
> bq. We don’t do data page skipping at all in parquet-cpp. We should add this
> to the short list of holistic improvements to the datasets infrastructure —
> we support row group skipping using column chunk statistics, but that is very
> coarse grained. data pages are much more fine-grained:
> bq.
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L516
--
This message was sent by Atlassian Jira
(v8.3.4#803005)