tangchenhao created HUDI-1919:
---------------------------------
Summary: Column misalignment occurs when reading the COPY_ON_WRITE
type of hudi table through Flink
Key: HUDI-1919
URL: https://issues.apache.org/jira/browse/HUDI-1919
Project: Apache Hudi
Issue Type: Bug
Components: Flink Integration
Environment: Hudi version : 0.9.0-SNAPSHOT
Flink version : 1.12.2
Hadoop version : 2.9.2
Storage (HDFS/S3/GCS..) : HDFS
Reporter: tangchenhao
Fix For: 0.9.0
Attachments: image-2021-05-22-00-02-03-762.png,
image-2021-05-22-00-02-41-706.png
The timing of the exception is: when the specified partition column field is
not at the end of the sequence of fields written to the hudi table.
For example, if the order of the fields (including partition columns) written
in the hudi table is: col1, col2, col3. At this time, if the partition column
field is col1, the exception will be generated. If the partition column field
is col3, it can work normally.
The exception stack is as follows:
!image-2021-05-22-00-02-03-762.png!
The local debugging is as follows:
!image-2021-05-22-00-02-41-706.png!
The location_type field is a partition field.
*Initial diagnosis reason*:
When reading the hudi table through Flink,
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
will be called. This method returns that the selectedTypes and
selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are
misaligned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)