Milos Sukovic created PARQUET-1781:
--------------------------------------
Summary: [C++] 1.4.0+ reader ignore stats created by 1.3.* writer
Key: PARQUET-1781
URL: https://issues.apache.org/jira/browse/PARQUET-1781
Project: Parquet
Issue Type: Bug
Components: parquet-cpp
Affects Versions: cpp-1.5.0, cpp-1.4.0
Reporter: Milos Sukovic
[https://github.com/apache/arrow/commit/d257a88ed612301c0411894dfa783fcbff1bc867]
In referenced commit, change to metadata.cc file changed the way for checking
if new stats (min_value/max_value) are used.
From
if (metadata.statistics.__isset.max_value ||
metadata.statistics.__isset.min_value)
to
if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER)
This change is breaking backward compat - all files which contain new stats
(min_value/max_value), and are created before this change are valid, but they
do not set column order flag.
After this change, those stats are ignored, because column order flag is
checked.
Possible fix would be something like:
if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER ||
(version == parquetcpp 1.3.* && (metadata.statistics.__isset.max_value ||
metadata.statistics.__isset.min_value)))
I checked parquet-mr, and it seems like there, columnOrder is introduced as
part of the same change as min_value and max_value, so issue shouldn't happen
for files created by java code, but probably, stats are ignored by their reader
too for files created by parquet-cpp 1.3.*.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)