[ https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Krzysztof Chmielewski updated FLINK-31197: ------------------------------------------ Description: After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible to write complex types with File sink using Parquet format. However it turns out that still it is impossible to write types such as: Array<Arrays> Array<Map> Array<Row> When trying to write a Parquet row with such types, the below exception is thrown: {code:java} Caused by: java.lang.RuntimeException: org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91) at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52) at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51) at org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191) {code} The exception is misleading, not showing the real problem. The reason why those complex types are still not working is that during developemnt of https://issues.apache.org/jira/browse/FLINK-17782 code paths for those types were left without implementation, no Unsupported Exception no nothing, simply empty methods. In https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java You will see {code:java} @Override public void write(ArrayData arrayData, int ordinal) {} {code} for MapWriter, ArrayWriter and RowWriter. I see two problems here: 1. Writing those three types is still not possible. 2. Flink is throwing an exception that gives no hint about the real issue here. It could throw "Unsupported operation" for now. Maybe this should be item for a different ticket? The code to reproduce this issue is attached to the ticket. It tries to write to Parquet file a single row with one column of type Array<Array<int>> was: After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible to write complex types with File sink using Parquet format. However it turns out that still it is impossible to write types such as: Array<Arrays> Array<Map> Array<Row> When trying to write a Parquet row with such types, the below exception is thrown: {code:java} Caused by: java.lang.RuntimeException: org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91) at org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71) at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) at org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52) at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51) at org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191) {code} The exception is misleading, not showing the real problem. The reason why those complex types are still not working is that during developemnt of https://issues.apache.org/jira/browse/FLINK-17782 code paths for those types were left without implementation, no Unsupported Exception no nothing, simply empty methods. In https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java You will see {code:java} @Override public void write(ArrayData arrayData, int ordinal) {} {code} for MapWriter, ArrayWriter and RowWriter. I see two problems here. 1. writing those three types is still not possible 2. Flink is throwing an exception that gives no hint about the real issue here. It could throw "Unsupported operation" for now. Maybe this should be item for different ticket? The code to reproduce this issue is attached to the ticket. It tries to write to Parquet file a single row with one column of type Array<Array<int>> > Exception while writing Parqeut files containing Arrays with complex types. > --------------------------------------------------------------------------- > > Key: FLINK-31197 > URL: https://issues.apache.org/jira/browse/FLINK-31197 > Project: Flink > Issue Type: Bug > Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.17.0, 1.15.2, 1.15.3, 1.16.1, > 1.15.4, 1.16.2, 1.17.1, 1.15.5 > Reporter: Krzysztof Chmielewski > Priority: Major > Attachments: ParquetSinkArrayOfArraysIssue.java > > > After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible > to write complex types with File sink using Parquet format. > However it turns out that still it is impossible to write types such as: > Array<Arrays> > Array<Map> > Array<Row> > When trying to write a Parquet row with such types, the below exception is > thrown: > {code:java} > Caused by: java.lang.RuntimeException: > org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the > field should be ommited completely instead > at > org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91) > at > org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71) > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138) > at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310) > at > org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52) > at > org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51) > at > org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191) > {code} > The exception is misleading, not showing the real problem. > The reason why those complex types are still not working is that during > developemnt of https://issues.apache.org/jira/browse/FLINK-17782 > code paths for those types were left without implementation, no Unsupported > Exception no nothing, simply empty methods. In > https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java > You will see > {code:java} > @Override > public void write(ArrayData arrayData, int ordinal) {} > {code} > for MapWriter, ArrayWriter and RowWriter. > I see two problems here: > 1. Writing those three types is still not possible. > 2. Flink is throwing an exception that gives no hint about the real issue > here. It could throw "Unsupported operation" for now. Maybe this should be > item for a different ticket? > The code to reproduce this issue is attached to the ticket. It tries to write > to Parquet file a single row with one column of type Array<Array<int>> -- This message was sent by Atlassian Jira (v8.20.10#820010)