Ildar created ARROW-4293:
----------------------------
Summary: [C++] Can't access parquet statistics on binary columns
Key: ARROW-4293
URL: https://issues.apache.org/jira/browse/ARROW-4293
Project: Apache Arrow
Issue Type: Bug
Reporter: Ildar
Hi,
I'm trying to use per-column statistics (min/max values) to filter out row
groups while reading parquet file. But I don't see statistics built for binary
columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}} discards
statistics that have sort order {{UNSIGNED }}and haven't been created by
{{parquet-cpp}}. As I understand there used to be some issues in {{parquet-mr}}
before. But do they still persist?
For example, I have parquet file created with {{parquet-mr}} version 1.10, it
seems to have correct min/max values for binary columns. And {{parquet-cpp}}
works fine for me if I remove this code from {{HasCorrectStatistics()}} func:
{{ if (SortOrder::SIGNED != sort_order && !max_equals_min) {}}
{{ return false; }}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)