zhang-yue1 opened a new issue, #14297:
URL: https://github.com/apache/hudi/issues/14297
### Describe the problem you faced
I am seeing the following error when reading a Hudi table via Hive/Tez:
Caused by: java.lang.RuntimeException: java.io.IOException:
org.apache.hudi.org.apache.avro.AvroRuntimeException: Duplicate field
fld_create_date in record
hoodie.hudi_ods_es_charge_incoming_data_df.hudi_ods_es_charge_incoming_data_df_record:
fld_create_date type:UNION pos:2 and fld_create_date type:UNION pos:0.
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
at
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
at
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 16 more
### To Reproduce
1.Use Spark to sync historical data and Flink for real-time writes. The
fld_create_date field type is confirmed to be consistent across both.
2.Perform offline compaction on the Hudi table.
3.Query the Hudi table using Hive on Tez, which triggers the duplicate field
error: Duplicate field fld_create_date.
### Expected behavior
When reading the Hudi table in Hive/Tez, there should be no duplicate field
error.
The query or scan should return the data correctly with the fld_create_date
field present only once.
### Environment Description
* Hudi version: 1.0.2
* Spark version:3.3.3
* Flink version:1.14.5
* Hive version:3.1.0
* Hadoop version:3.1.0
* Storage (HDFS/S3/GCS..):hdfs
* Running on Docker? (yes/no): no
### Additional context
_No response_
### Stacktrace
```shell
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]