[
https://issues.apache.org/jira/browse/HUDI-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen reopened HUDI-5381:
------------------------------
> Class cast exception with Flink 1.15 source when reading table written using
> bulk insert
> ----------------------------------------------------------------------------------------
>
> Key: HUDI-5381
> URL: https://issues.apache.org/jira/browse/HUDI-5381
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink
> Reporter: Kenneth William Krugler
> Assignee: Danny Chen
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When running a unit test that reads records which were written by a Flink
> workflow using MOR and batch insert, the following exception occurs:
> {{22/12/13 08:22:04 WARN taskmanager.Task:1104 - Source: Hudi Source ->
> Conver Row Data To Enriched Netflow (1/1)#5
> (75289b6b98a35bc5d4522caaed3753a1) switched from RUNNING to FAILED with
> failure cause: java.lang.NoSuchMethodError:
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextRecord()Lorg/apache/flink/table/data/ColumnarRowData;
> at
> org.apache.hudi.table.format.mor.MergeOnReadInputFormat$BaseFileOnlyFilteringIterator.reachedEnd(MergeOnReadInputFormat.java:510)
> at
> org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:245)
> at
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:89)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)
> }}
> MergeOnReadInputFormat (which is part of hudi-flink) uses
> ParquetColumnarRowSplitReader to get the next record, and each Flink-specific
> jar (hudi-flink1.13x, 1.14x, 1.15x) has their own version of
> ParquetColumnarRowSplitReader.
> Unfortunately the nextRecord() method in this class returns Flinkās
> ColumnarRowData, which changed packages between Flink 1.14 and 1.15. So when
> you use MergeOnReadInputFormat with Flink 1.15, you get a class cast error,
> because (I assume) MergeOnReadInputFormat was compiled against 1.13 or 1.14,
> where the result type from nextRecord() is
> org.apache.hudi.table.data.ColumnarRowData, not
> org.apache.flink.table.data.columnar.ColumnarRowData.
> This error didn't happen previously when the Flink workflow was using insert
> (versus bulk insert) to write the table...though I haven't dug into how this
> is related.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)