[
https://issues.apache.org/jira/browse/PARQUET-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew M updated PARQUET-2100:
-------------------------------
Affects Version/s: 1.12.2
> Merging two valid parquet files produces a corrupted result file in 1.12.1
> --------------------------------------------------------------------------
>
> Key: PARQUET-2100
> URL: https://issues.apache.org/jira/browse/PARQUET-2100
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.12.1, 1.12.2
> Reporter: Matthew M
> Priority: Major
> Attachments: input_file1.parquet, input_file2.parquet,
> output_file.parquet
>
>
> This ticket relates to PARQUET-2027. In the previous ticket for two parquet
> files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging
> was fixed, i. e. it doesn't fail. But in the same time it results with a
> corrupted output file. The error:
> {code:java}
> Dictionary page must be before data page.
> {code}
> is thrown when one tries to read it. It comes from this
> [https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].
> I attached two example input files and the outcome of merging.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)