[GitHub] [hudi] AirToSupply opened a new issue #2976: [SUPPORT] Column misalignment occurs when reading the COPY_ON_WRITE type of hudi table through Flink

GitBox Fri, 21 May 2021 03:51:16 -0700


AirToSupply opened a new issue #2976:
URL: https://github.com/apache/hudi/issues/2976

**To Reproduce**

Steps to reproduce the behavior:

1.Build from source with branch [master], the version is 0.9.0-SNAPSHOT.
2.Start a Fink1.12.x streaming job, read data from hudi table to test.
3.Online observation, the exception caused the flink job to fail

**Expected behavior**
java.lang.IllegalArgumentException: Unexpected type: ...

A clear and concise description of what you expected to happen.

**Environment Description**

* Hudi version : 0.9.0-SNAPSHOT

* Spark version : None

* Hive version : None

* Hadoop version : 2.9.2

* Storage (HDFS/S3/GCS..) : HDFS

* Running on Docker? (yes/no) : no

**Additional context**

The timing of the exception is: when the specified partition column field is
not at the end of the sequence of fields written to the hudi table.

For example, if the order of the fields (including partition columns)
written in the hudi table is: col1, col2, col3. At this time, if the partition
column field is col1, the exception will be generated. If the partition column
field is col3, it can work normally.

A clear and concise description of the problem.

**Stacktrace**

The exception stack is as follows：

![BB0B7B65-BC82-40da-ABD9-6550956AAFDD](https://user-images.githubusercontent.com/62897740/119125433-588c0780-ba64-11eb-9bb6-1fad46a2a3b5.png)

The local debugging is as follows:

![C10E0226-BBAD-4ef3-B3AE-161586449B35](https://user-images.githubusercontent.com/62897740/119125566-82452e80-ba64-11eb-81ab-3576fc4ff97b.png)

Initial diagnosis reason: When reading the hudi table through Flink,
org.apache.hudi.table.format.cow.ParquetSplitReaderUtil#genPartColumnarRowReader
will be called. This method returns that the selectedTypes and
selectedFieldNames arrays in the ParquetColumnarRowSplitReader object are
misaligned.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] AirToSupply opened a new issue #2976: [SUPPORT] Column misalignment occurs when reading the COPY_ON_WRITE type of hudi table through Flink

Reply via email to