[ 
https://issues.apache.org/jira/browse/FLINK-31197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31197:
------------------------------------------
    Description: 
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array<Arrays>
Array<Map>
Array<Row> 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
        at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
        at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
        at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
        at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
        at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
        at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
        at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here:
1. Writing those three types is still not possible.
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for a 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array<Array<int>>

  was:
After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
to write complex types with File sink using Parquet format. 

However it turns out that still it is impossible to write types such as:
Array<Arrays>
Array<Map>
Array<Row> 

When trying to write a Parquet row with such types, the below exception is 
thrown:
{code:java}
Caused by: java.lang.RuntimeException: 
org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
field should be ommited completely instead
        at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
        at 
org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
        at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
        at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
        at 
org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
        at 
org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
        at 
org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)

{code}


The exception is misleading, not showing the real problem. 
The reason why those complex types are still not working is that during 
developemnt of https://issues.apache.org/jira/browse/FLINK-17782

code paths for those types were left without implementation, no Unsupported 
Exception no nothing, simply empty methods. In 
https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
You will see 
{code:java}
@Override
public void write(ArrayData arrayData, int ordinal) {}
{code}

for MapWriter, ArrayWriter and RowWriter.

I see two problems here.
1. writing those three types is still not possible
2. Flink is throwing an exception that gives no hint about the real issue here. 
It could throw "Unsupported operation" for now. Maybe this should be item for 
different ticket?


The code to reproduce this issue is attached to the ticket. It tries to write 
to Parquet file a single row with one column of type Array<Array<int>>


> Exception while writing Parqeut files containing Arrays with complex types.
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-31197
>                 URL: https://issues.apache.org/jira/browse/FLINK-31197
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.15.0, 1.15.1, 1.16.0, 1.17.0, 1.15.2, 1.15.3, 1.16.1, 
> 1.15.4, 1.16.2, 1.17.1, 1.15.5
>            Reporter: Krzysztof Chmielewski
>            Priority: Major
>         Attachments: ParquetSinkArrayOfArraysIssue.java
>
>
> After https://issues.apache.org/jira/browse/FLINK-17782 It should be possible 
> to write complex types with File sink using Parquet format. 
> However it turns out that still it is impossible to write types such as:
> Array<Arrays>
> Array<Map>
> Array<Row> 
> When trying to write a Parquet row with such types, the below exception is 
> thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: 
> org.apache.parquet.io.ParquetEncodingException: empty fields are illegal, the 
> field should be ommited completely instead
>       at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:91)
>       at 
> org.apache.flink.formats.parquet.row.ParquetRowDataBuilder$ParquetWriteSupport.write(ParquetRowDataBuilder.java:71)
>       at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:138)
>       at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:310)
>       at 
> org.apache.flink.formats.parquet.ParquetBulkWriter.addElement(ParquetBulkWriter.java:52)
>       at 
> org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.write(BulkPartWriter.java:51)
>       at 
> org.apache.flink.connector.file.sink.writer.FileWriterBucket.write(FileWriterBucket.java:191)
> {code}
> The exception is misleading, not showing the real problem. 
> The reason why those complex types are still not working is that during 
> developemnt of https://issues.apache.org/jira/browse/FLINK-17782
> code paths for those types were left without implementation, no Unsupported 
> Exception no nothing, simply empty methods. In 
> https://github.com/apache/flink/blob/release-1.16.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriter.java
> You will see 
> {code:java}
> @Override
> public void write(ArrayData arrayData, int ordinal) {}
> {code}
> for MapWriter, ArrayWriter and RowWriter.
> I see two problems here:
> 1. Writing those three types is still not possible.
> 2. Flink is throwing an exception that gives no hint about the real issue 
> here. It could throw "Unsupported operation" for now. Maybe this should be 
> item for a different ticket?
> The code to reproduce this issue is attached to the ticket. It tries to write 
> to Parquet file a single row with one column of type Array<Array<int>>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to