[ 
https://issues.apache.org/jira/browse/HUDI-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianghu Wang resolved HUDI-1919.
--------------------------------
    Resolution: Fixed

Fixed via master : aba1eadbfc015095f31271b2648faa7023126b99

> Type mismatch when streaming read copy_on_write table using flink
> -----------------------------------------------------------------
>
>                 Key: HUDI-1919
>                 URL: https://issues.apache.org/jira/browse/HUDI-1919
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Flink Integration
>         Environment: Hudi version : 0.9.0-SNAPSHOT
> Flink version : 1.12.2
> Hadoop version : 2.9.2
> Storage (HDFS/S3/GCS..) : HDFS
>            Reporter: Well Tang
>            Assignee: Well Tang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>         Attachments: image-2021-05-22-00-02-03-762.png, 
> image-2021-05-22-00-02-41-706.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> *Problem overview*:
> The timing of the exception is: when the specified partition column field is 
> not at the end of the sequence of fields written to the hudi table(type is 
> *COW*).
> For example, if the order of the fields (including partition columns) written 
> in the hudi table is: col1, col2, col3. At this time, if the partition column 
> field is col1, the exception will be generated. If the partition column field 
> is col3, it can work.
>  
> *The hypothesis and phenomenon of this problem are as follows:*
> First, we register into a hudi table(type: *COW*) by using DDL statement in 
> Flink SQL in a *real-time* *computing* application. The generation of this 
> exception involves two cases:
> 【case-1】When querying *some* columns of hudi table (for example: select 
> col1,col2 from table),the bug *sometimes* occurs and does not necessarily 
> cause an exception.
> 【case-2】When querying *all* the fields of hudi table (for example: select * 
> from table),the bug *is* *bound* *to* occur.
>  
> *The exception stack is as follows:*
> !image-2021-05-22-00-02-03-762.png!
> *The local debugging is as follows:*
> !image-2021-05-22-00-02-41-706.png!
> The location_type field is a partition field,and it is not at the end of the 
> field order to occur the field name and field datatype to be misplaced in 
> subsequent processing.
>  
> *Initial diagnosis reason*:
> When reading the hudi table through Flink, 
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
>  will be called. This method returns that the *selectedTypes* and 
> *selectedFieldNames* arrays in the *ParquetColumnarRowSplitReader* object are 
> misaligned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to