[
https://issues.apache.org/jira/browse/HUDI-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Well Tang updated HUDI-1919:
----------------------------
Description:
*Problem overview*:
The timing of the exception is: when the specified partition column field is
not at the end of the sequence of fields written to the hudi table.
For example, if the order of the fields (including partition columns) written
in the hudi table is: col1, col2, col3. At this time, if the partition column
field is col1, the exception will be generated. If the partition column field
is col3, it can work.
*The hypothesis and phenomenon of this problem are as follows:*
First, register into a Hudi table by using DDL statement in Flink SQL. The
generation of this exception involves two cases:
[*case-1*]:When querying *some* columns of hudi table (for example: select
col1,col2 from table),the bug *sometimes* occurs and does not necessarily cause
an exception.
[*case-2*]:When querying *all* the fields of hudi table (for example: select *
from table),the bug i*s bound to* occur.
*The exception stack is as follows:*
!image-2021-05-22-00-02-03-762.png!
*The local debugging is as follows:*
!image-2021-05-22-00-02-41-706.png!
The location_type field is a partition field,and it is not at the end of the
field order to occur the field name and field datatype to be misplaced in
subsequent processing.
*Initial diagnosis reason*:
When reading the hudi table through Flink,
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
will be called. This method returns that the *selectedTypes* and
*selectedFieldNames* arrays in the *ParquetColumnarRowSplitReader* object are
misaligned.
was:
The timing of the exception is: when the specified partition column field is
not at the end of the sequence of fields written to the hudi table.
For example, if the order of the fields (including partition columns) written
in the hudi table is: col1, col2, col3. At this time, if the partition column
field is col1, the exception will be generated. If the partition column field
is col3, it can work.
The exception stack is as follows:
!image-2021-05-22-00-02-03-762.png!
The local debugging is as follows:
!image-2021-05-22-00-02-41-706.png!
The location_type field is a partition field,and it is not at the end of the
field order to occur the field name and field datatype to be misplaced in
subsequent processing.
*Initial diagnosis reason*:
When reading the hudi table through Flink,
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
will be called. This method returns that the *selectedTypes* and
*selectedFieldNames* arrays in the *ParquetColumnarRowSplitReader* object are
misaligned.
> Fix column misalignment occurs when reading the copy_on_write type of hudi
> table through Flink
> ----------------------------------------------------------------------------------------------
>
> Key: HUDI-1919
> URL: https://issues.apache.org/jira/browse/HUDI-1919
> Project: Apache Hudi
> Issue Type: Bug
> Components: Flink Integration
> Environment: Hudi version : 0.9.0-SNAPSHOT
> Flink version : 1.12.2
> Hadoop version : 2.9.2
> Storage (HDFS/S3/GCS..) : HDFS
> Reporter: Well Tang
> Assignee: Well Tang
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2021-05-22-00-02-03-762.png,
> image-2021-05-22-00-02-41-706.png
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> *Problem overview*:
> The timing of the exception is: when the specified partition column field is
> not at the end of the sequence of fields written to the hudi table.
> For example, if the order of the fields (including partition columns) written
> in the hudi table is: col1, col2, col3. At this time, if the partition column
> field is col1, the exception will be generated. If the partition column field
> is col3, it can work.
>
> *The hypothesis and phenomenon of this problem are as follows:*
> First, register into a Hudi table by using DDL statement in Flink SQL. The
> generation of this exception involves two cases:
> [*case-1*]:When querying *some* columns of hudi table (for example: select
> col1,col2 from table),the bug *sometimes* occurs and does not necessarily
> cause an exception.
> [*case-2*]:When querying *all* the fields of hudi table (for example: select
> * from table),the bug i*s bound to* occur.
>
> *The exception stack is as follows:*
> !image-2021-05-22-00-02-03-762.png!
> *The local debugging is as follows:*
> !image-2021-05-22-00-02-41-706.png!
> The location_type field is a partition field,and it is not at the end of the
> field order to occur the field name and field datatype to be misplaced in
> subsequent processing.
>
> *Initial diagnosis reason*:
> When reading the hudi table through Flink,
> org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
> will be called. This method returns that the *selectedTypes* and
> *selectedFieldNames* arrays in the *ParquetColumnarRowSplitReader* object are
> misaligned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)