[jira] [Commented] (HUDI-5381) Class cast exception with Flink 1.15 source when reading table written using bulk insert

Kenneth William Krugler (Jira) Tue, 13 Dec 2022 05:30:08 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646633#comment-17646633
 ]


Kenneth William Krugler commented on HUDI-5381:
-----------------------------------------------

I can fix this in several ways, the easiest being to have 
ParquetColumnarRowSplitReader.nextRecord() return a RowData, which hasn’t moved 
around. Thoughts on this approach? I've got a fork w/this fix, and using the 
resulting jar in my workflow results in no longer getting the exception. The 
one downside is that the nextRecord() result is no longer the richer 
ColumnarRowData type, in case anyone needs that.

> Class cast exception with Flink 1.15 source when reading table written using 
> bulk insert
> ----------------------------------------------------------------------------------------
>
>                 Key: HUDI-5381
>                 URL: https://issues.apache.org/jira/browse/HUDI-5381
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: flink
>            Reporter: Kenneth William Krugler
>            Priority: Critical
>
> When running a unit test that reads records which were written by a Flink 
> workflow using MOR and batch insert, the following exception occurs:
> {{22/12/13 08:22:04 WARN taskmanager.Task:1104 - Source: Hudi Source -> 
> Conver Row Data To Enriched Netflow (1/1)#5 
> (75289b6b98a35bc5d4522caaed3753a1) switched from RUNNING to FAILED with 
> failure cause: java.lang.NoSuchMethodError: 
> org.apache.hudi.table.format.cow.vector.reader.ParquetColumnarRowSplitReader.nextRecord()Lorg/apache/flink/table/data/ColumnarRowData;
>       at 
> org.apache.hudi.table.format.mor.MergeOnReadInputFormat$BaseFileOnlyFilteringIterator.reachedEnd(MergeOnReadInputFormat.java:510)
>       at 
> org.apache.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:245)
>       at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:89)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:67)
>       at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:332)
> }}
> MergeOnReadInputFormat (which is part of hudi-flink) uses 
> ParquetColumnarRowSplitReader to get the next record, and each Flink-specific 
> jar (hudi-flink1.13x, 1.14x, 1.15x) has their own version of 
> ParquetColumnarRowSplitReader.
> Unfortunately the nextRecord() method in this class returns Flink’s 
> ColumnarRowData, which changed packages between Flink 1.14 and 1.15. So when 
> you use MergeOnReadInputFormat with Flink 1.15, you get a class cast error, 
> because (I assume) MergeOnReadInputFormat was compiled against 1.13 or 1.14, 
> where the result type from nextRecord() is 
> org.apache.hudi.table.data.ColumnarRowData, not 
> org.apache.flink.table.data.columnar.ColumnarRowData.
> This error didn't happen previously when the Flink workflow was using insert 
> (versus bulk insert) to write the table...though I haven't dug into how this 
> is related.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5381) Class cast exception with Flink 1.15 source when reading table written using bulk insert

Reply via email to