[jira] [Updated] (HUDI-1919) Column misalignment occurs when reading the copy_on_write type of hudi table through Flink

Well Tang (Jira) Fri, 21 May 2021 18:38:09 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Well Tang updated HUDI-1919:
----------------------------
    Description: 
The timing of the exception is: when the specified partition column field is 
not at the end of the sequence of fields written to the hudi table.

For example, if the order of the fields (including partition columns) written 
in the hudi table is: col1, col2, col3. At this time, if the partition column 
field is col1, the exception will be generated. If the partition column field 
is col3, it can work normally.

 

The exception stack is as follows：

!image-2021-05-22-00-02-03-762.png!

The local debugging is as follows:

!image-2021-05-22-00-02-41-706.png!

The location_type field is a partition field,and it is not at the end of the 
field order.

*Initial diagnosis reason*:

When reading the hudi table through Flink, 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
 will be called. This method returns that the selectedTypes and 
selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are 
misaligned.

  was:
The timing of the exception is: when the specified partition column field is 
not at the end of the sequence of fields written to the hudi table.

For example, if the order of the fields (including partition columns) written 
in the hudi table is: col1, col2, col3. At this time, if the partition column 
field is col1, the exception will be generated. If the partition column field 
is col3, it can work normally.

 

The exception stack is as follows：

!image-2021-05-22-00-02-03-762.png!

The local debugging is as follows:

!image-2021-05-22-00-02-41-706.png!

The location_type field is a partition field.

*Initial diagnosis reason*:

When reading the hudi table through Flink, 
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
 will be called. This method returns that the selectedTypes and 
selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are 
misaligned.


> Column misalignment occurs when reading the copy_on_write type of hudi table 
> through Flink
> ------------------------------------------------------------------------------------------
>
>                 Key: HUDI-1919
>                 URL: https://issues.apache.org/jira/browse/HUDI-1919
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Flink Integration
>         Environment: Hudi version : 0.9.0-SNAPSHOT
> Flink version : 1.12.2
> Hadoop version : 2.9.2
> Storage (HDFS/S3/GCS..) : HDFS
>            Reporter: Well Tang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>         Attachments: image-2021-05-22-00-02-03-762.png, 
> image-2021-05-22-00-02-41-706.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> The timing of the exception is: when the specified partition column field is 
> not at the end of the sequence of fields written to the hudi table.
> For example, if the order of the fields (including partition columns) written 
> in the hudi table is: col1, col2, col3. At this time, if the partition column 
> field is col1, the exception will be generated. If the partition column field 
> is col3, it can work normally.
>  
> The exception stack is as follows：
> !image-2021-05-22-00-02-03-762.png!
> The local debugging is as follows:
> !image-2021-05-22-00-02-41-706.png!
> The location_type field is a partition field,and it is not at the end of the 
> field order.
> *Initial diagnosis reason*:
> When reading the hudi table through Flink, 
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
>  will be called. This method returns that the selectedTypes and 
> selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are 
> misaligned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1919) Column misalignment occurs when reading the copy_on_write type of hudi table through Flink

Reply via email to