Matthew M created PARQUET-2100:
----------------------------------

             Summary: Merging two valid parquet files produces a corrupted 
result file in 1.12.1
                 Key: PARQUET-2100
                 URL: https://issues.apache.org/jira/browse/PARQUET-2100
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.12.1
            Reporter: Matthew M
         Attachments: input_file1.parquet, input_file2.parquet, 
output_file.parquet

This ticket relates to PARQUET-2027. In the previous ticket for two parquet 
files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging was 
fixed, i. e. it doesn't fail. But in the same time it results with a corrupted 
output file. The error:
{code:java}
Dictionary page must be before data page.
{code}
is thrown when one tries to read it. It comes from this 
[https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712|https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].]

I attached two example input files and the outcome of merging.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to