Krzysztof Chmielewski created FLINK-31197:
---------------------------------------------
Summary: Exception while writing Parqeut files containing Arrays
with complex types.
Key: FLINK-31197
URL: https://issues.apache.org/jira/browse/FLINK-31197
Project: Flink
Issue Type: Bug
Affects Versions: 1.16.1, 1.15.3, 1.15.2, 1.16.0, 1.15.1, 1.15.0, 1.17.0,
1.15.4, 1.16.2, 1.17.1, 1.15.5
Reporter: Krzysztof Chmielewski
Attachments: ParquetSinkArrayOfArraysIssue.java
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible
to write complex types with File sink using Parquet format.
However it turns out that still it is impossible to write types such as:
Array<Arrays>
Array<Map>
Array<Row>
When trying to write a Parquet row with such types, the below exception is
thrown:
{code:java}
Caused by: java.lang.RuntimeException:
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the
field should be ommited completely instead
at
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
at
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
at
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
at
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
at
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
at
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)
{code}
The exception is misleading, not showing the real problem.
The reason why those complex types are still not working is that during
developemnt of https://issues.apache.org/jira/browse/FLINK-17782
code paths for those types were left without implementation, no Unsupported
Exception no nothing, simply empty methods. In
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}
for MapWriter, ArrayWriter and RowWriter.
I see two problems here.
1. writing those three types is still not possible
2. Flink is throwing an exception that gives no hint about the real issue here.
It could throw "Unsupported operation" for now. Maybe this should be item for
different ticket?
The code to reproduce this issue is attached to the ticket. It tries to write
to Parquet file a single row with one column of type Array<Array<int>>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)