[
https://issues.apache.org/jira/browse/ARROW-9459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joris Van den Bossche updated ARROW-9459:
-----------------------------------------
Fix Version/s: (was: 3.0.0)
> [C++][Dataset] Make collecting/parsing statistics optional for ParquetFragment
> ------------------------------------------------------------------------------
>
> Key: ARROW-9459
> URL: https://issues.apache.org/jira/browse/ARROW-9459
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Joris Van den Bossche
> Priority: Major
> Labels: dataset, dataset-dask-integration
>
> See some timing checks here:
> https://github.com/dask/dask/pull/6346#issuecomment-656548675
> Parsing all statistics, even from a centralized {{_metadata}} file, can be
> quite expensive. If you know in advance that you are not going to use them
> (eg you are only going to do filtering on the partition fields, and otherwise
> read all data), it could be nice to have an option to disable parsing
> statistics.
> cc [~rjzamora] [~bkietz] [~fsaintjacques]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)