[
https://issues.apache.org/jira/browse/FLINK-31202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krzysztof Chmielewski updated FLINK-31202:
------------------------------------------
Description:
reading complex types to Parquet is possible since Flink 1.16 after
implementing https://issues.apache.org/jira/browse/FLINK-24614
However this implementation lacks support for reading complex nested types such
as
* Array<Array>
* Array<Map>
* Array<Row>
This ticket is about to add support for reading below types from Parquet format
files.
Currently when trying to read Parquet file containing column which such a type,
below exception is thrown:
{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1`
INT>
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}
OR:
{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ARRAY<INT>
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}
Parquet files and reproducer code is attached to the ticket
was:
reading complex types to Parquet is possible since Flink 1.16 after
implementing https://issues.apache.org/jira/browse/FLINK-24614
However this implementation lacks support for reading complex nested types such
as
* Array<Array>
* Array<Map>
* Array<Row>
This ticket is about to add support for reading below types from Parquet format
files.
Currently when trying to read Parquet file containing column which such a type,
below exception is thrown:
{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1`
INT>
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}
OR:
{code:java}
Caused by: java.lang.RuntimeException: Unsupported type in the list: ARRAY<INT>
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
at
org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
{code}
> Add support for reading Parquet files containing Arrays with complex types.
> ---------------------------------------------------------------------------
>
> Key: FLINK-31202
> URL: https://issues.apache.org/jira/browse/FLINK-31202
> Project: Flink
> Issue Type: New Feature
> Affects Versions: 1.16.0, 1.17.0, 1.16.1, 1.16.2, 1.17.1
> Reporter: Krzysztof Chmielewski
> Priority: Major
>
> reading complex types to Parquet is possible since Flink 1.16 after
> implementing https://issues.apache.org/jira/browse/FLINK-24614
> However this implementation lacks support for reading complex nested types
> such as
> * Array<Array>
> * Array<Map>
> * Array<Row>
> This ticket is about to add support for reading below types from Parquet
> format files.
> Currently when trying to read Parquet file containing column which such a
> type, below exception is thrown:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list: ROW<`f1`
> INT>
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> OR:
> {code:java}
> Caused by: java.lang.RuntimeException: Unsupported type in the list:
> ARRAY<INT>
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readPrimitiveTypedRow(ArrayColumnReader.java:175)
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.fetchNextValue(ArrayColumnReader.java:113)
> at
> org.apache.flink.formats.parquet.vector.reader.ArrayColumnReader.readToVector(ArrayColumnReader.java:81)
> {code}
> Parquet files and reproducer code is attached to the ticket
--
This message was sent by Atlassian Jira
(v8.20.10#820010)