[
https://issues.apache.org/jira/browse/PARQUET-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney moved ARROW-4293 to PARQUET-1494:
----------------------------------------------
Workflow: patch-available, re-open possible (was: jira)
Key: PARQUET-1494 (was: ARROW-4293)
Project: Parquet (was: Apache Arrow)
> [C++] Can't access parquet statistics on binary columns
> -------------------------------------------------------
>
> Key: PARQUET-1494
> URL: https://issues.apache.org/jira/browse/PARQUET-1494
> Project: Parquet
> Issue Type: Bug
> Reporter: Ildar
> Priority: Major
>
> Hi,
> I'm trying to use per-column statistics (min/max values) to filter out row
> groups while reading parquet file. But I don't see statistics built for
> binary columns. I noticed that {{ApplicationVersion::HasCorrectStatistics()}}
> discards statistics that have sort order {{UNSIGNED and haven't been created
> by parquet-cpp}}. As I understand there used to be some issues in
> {{parquet-mr}} before. But do they still persist?
> For example, I have parquet file created with {{parquet-mr}} version 1.10, it
> seems to have correct min/max values for binary columns. And {{parquet-cpp}}
> works fine for me if I remove this code from {{HasCorrectStatistics()}} func:
>
> {code:java}
> if (SortOrder::SIGNED != sort_order && !max_equals_min) {
> return false;
> }{code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)