[
https://issues.apache.org/jira/browse/HUDI-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen closed HUDI-6211.
----------------------------
Fix Version/s: 0.14.0
Resolution: Fixed
Fixed via master branch: 014168b948a5a1ca88ea3e8cca213dfd46cef3f4
> Fix reading of schema-evolved complex columns for Flink
> -------------------------------------------------------
>
> Key: HUDI-6211
> URL: https://issues.apache.org/jira/browse/HUDI-6211
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: voon
> Assignee: voon
> Priority: Major
> Fix For: 0.14.0
>
>
> If a Hudi-table's column with a complex type (struct) is evolved with newly
> added fields, or reordered fields, Hudi-On-Flink will not be able to read the
> parquet files that are in the old schema.
>
> An error as such will be thrown:
>
> Different errors will be thrown for schema evolution operations. If a new
> column is added at the back of the struct column, the error below will be
> thrown.
>
> {code:java}
> Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
> at java.util.ArrayList.rangeCheck(ArrayList.java:659)
> at java.util.ArrayList.get(ArrayList.java:435)
> at org.apache.parquet.schema.GroupType.getType(GroupType.java:216)
> at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:528)
> at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:416)
> at
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.createWritableVectors(ParquetColumnarRowSplitReader.java:217)
> at
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.<init>(ParquetColumnarRowSplitReader.java:157)
> at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:148)
> at
> org.apache.hudi.table.format.RecordIterators.getParquetRecordIterator(RecordIterators.java:72)
> at
> org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:132)
> at
> org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:66)
> at
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:84)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)
> {code}
>
>
> If a column is added in the middle of the struct column, a error as such will
> be thrown:
>
> {code:java}
> java.lang.IllegalArgumentException: Unexpected type: INT32 at
> org.apache.parquet.Preconditions.checkArgument(Preconditions.java:77) at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:456)
> at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.createWritableColumnVector(ParquetSplitReaderUtil.java:413)
> at
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.createWritableVectors(ParquetColumnarRowSplitReader.java:216)
> at
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.<init>(ParquetColumnarRowSplitReader.java:156)
> at
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil.genPartColumnarRowReader(ParquetSplitReaderUtil.java:151)
> at
> org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:151)
> at
> org.apache.hudi.table.format.cow.CopyOnWriteInputFormat.open(CopyOnWriteInputFormat.java:68)
> at
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:116)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:128)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:73)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:333)
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)