[
https://issues.apache.org/jira/browse/PARQUET-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew M updated PARQUET-2100:
-------------------------------
Description:
This ticket relates to PARQUET-2027. In the previous ticket for two parquet
files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging was
fixed, i. e. it doesn't fail. But in the same time it results with a corrupted
output file. The error:
{code:java}
Dictionary page must be before data page.
{code}
is thrown when one tries to read it. It comes from this
[https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].
I attached two example input files and the outcome of merging.
was:
This ticket relates to PARQUET-2027. In the previous ticket for two parquet
files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging was
fixed, i. e. it doesn't fail. But in the same time it results with a corrupted
output file. The error:
{code:java}
Dictionary page must be before data page.
{code}
is thrown when one tries to read it. It comes from this
[https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712|https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].]
I attached two example input files and the outcome of merging.
> Merging two valid parquet files produces a corrupted result file in 1.12.1
> --------------------------------------------------------------------------
>
> Key: PARQUET-2100
> URL: https://issues.apache.org/jira/browse/PARQUET-2100
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.12.1
> Reporter: Matthew M
> Priority: Major
> Attachments: input_file1.parquet, input_file2.parquet,
> output_file.parquet
>
>
> This ticket relates to PARQUET-2027. In the previous ticket for two parquet
> files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging
> was fixed, i. e. it doesn't fail. But in the same time it results with a
> corrupted output file. The error:
> {code:java}
> Dictionary page must be before data page.
> {code}
> is thrown when one tries to read it. It comes from this
> [https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].
> I attached two example input files and the outcome of merging.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)