Milos Sukovic created PARQUET-1781:
--------------------------------------

             Summary: [C++] 1.4.0+ reader ignore stats created by 1.3.* writer
                 Key: PARQUET-1781
                 URL: https://issues.apache.org/jira/browse/PARQUET-1781
             Project: Parquet
          Issue Type: Bug
          Components: parquet-cpp
    Affects Versions: cpp-1.5.0, cpp-1.4.0
            Reporter: Milos Sukovic


[https://github.com/apache/arrow/commit/d257a88ed612301c0411894dfa783fcbff1bc867]

In referenced commit, change to metadata.cc file changed the way for checking 
if new stats (min_value/max_value) are used.

From

if (metadata.statistics.__isset.max_value || 
metadata.statistics.__isset.min_value)

to

if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER)

 

This change is breaking backward compat - all files which contain new stats 
(min_value/max_value), and are created before this change are valid, but they 
do not set column order flag.

After this change, those stats are ignored, because column order flag is 
checked.

Possible fix would be something like:

if (descr->column_order().get_order() == ColumnOrder::TYPE_DEFINED_ORDER || 
(version == parquetcpp 1.3.* && (metadata.statistics.__isset.max_value || 
metadata.statistics.__isset.min_value)))

I checked parquet-mr, and it seems like there, columnOrder is introduced as 
part of the same change as min_value and max_value, so issue shouldn't happen 
for files created by java code, but probably, stats are ignored by their reader 
too for files created by parquet-cpp 1.3.*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to