[ 
https://issues.apache.org/jira/browse/FLINK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krzysztof Chmielewski updated FLINK-31202:
------------------------------------------
    Attachment: ParquetSourceArrayOfArraysIssue.java
                ParquetSourceArrayOfRowIssue.java

> Add support for reading Parquet files containing Arrays with complex types.
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-31202
>                 URL: https://issues.apache.org/jira/browse/FLINK-31202
>             Project: Flink
>          Issue Type: New Feature
>    Affects Versions: 1.16.0, 1.17.0, 1.16.1, 1.16.2, 1.17.1
>            Reporter: Krzysztof Chmielewski
>            Priority: Major
>         Attachments: ParquetSourceArrayOfArraysIssue.java, 
> ParquetSourceArrayOfRowIssue.java, arrayOfArrayOfInts.snappy.parquet, 
> arrayOfrows.snappy.parquet
>
>
> reading complex types to Parquet is possible since Flink 1.16 after 
> implementing https://issues.apache.org/jira/browse/FLINK-24614
> However this implementation lacks support for reading complex nested types 
> such as
> * Array<Array>
> * Array<Map>
> * Array<Row>
> This ticket is about to add support for reading below types from Parquet 
> format files.
> Currently when trying to read Parquet file containing column which such a 
> type, below exception is thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1` 
> INT>
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> OR:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: 
> ARRAY<INT>
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
>       at 
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> Parquet files and reproducer code is attached to the ticket



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to