[ 
https://issues.apache.org/jira/browse/PARQUET-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew M updated PARQUET-2100:
-------------------------------
    Affects Version/s: 1.12.2

> Merging two valid parquet files produces a corrupted result file in 1.12.1
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-2100
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2100
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.1, 1.12.2
>            Reporter: Matthew M
>            Priority: Major
>         Attachments: input_file1.parquet, input_file2.parquet, 
> output_file.parquet
>
>
> This ticket relates to PARQUET-2027. In the previous ticket for two parquet 
> files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging 
> was fixed, i. e. it doesn't fail. But in the same time it results with a 
> corrupted output file. The error:
> {code:java}
> Dictionary page must be before data page.
> {code}
> is thrown when one tries to read it. It comes from this 
> [https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].
> I attached two example input files and the outcome of merging.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to