Matthew M created PARQUET-2100:
----------------------------------
Summary: Merging two valid parquet files produces a corrupted
result file in 1.12.1
Key: PARQUET-2100
URL: https://issues.apache.org/jira/browse/PARQUET-2100
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.12.1
Reporter: Matthew M
Attachments: input_file1.parquet, input_file2.parquet,
output_file.parquet
This ticket relates to PARQUET-2027. In the previous ticket for two parquet
files produced by 1.11.x merging was failing in 1.12.0. For 1.12.1 merging was
fixed, i. e. it doesn't fail. But in the same time it results with a corrupted
output file. The error:
{code:java}
Dictionary page must be before data page.
{code}
is thrown when one tries to read it. It comes from this
[https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712|https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/record_reader.cc#L712].]
I attached two example input files and the outcome of merging.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)