zhang-yue1 opened a new issue, #14297:
URL: https://github.com/apache/hudi/issues/14297

   ### Describe the problem you faced
   
   I am seeing the following error when reading a Hudi table via Hive/Tez:
   
   Caused by: java.lang.RuntimeException: java.io.IOException: 
org.apache.hudi.org.apache.avro.AvroRuntimeException: Duplicate field 
fld_create_date in record 
hoodie.hudi_ods_es_charge_incoming_data_df.hudi_ods_es_charge_incoming_data_df_record:
 fld_create_date type:UNION pos:2 and fld_create_date type:UNION pos:0.
                at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
                at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
                at 
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
                at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
                at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
                at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
                ... 16 more
   
   
   
   ### To Reproduce
   
   1.Use Spark to sync historical data and Flink for real-time writes. The 
fld_create_date field type is confirmed to be consistent across both.
   
   2.Perform offline compaction on the Hudi table.
   
   3.Query the Hudi table using Hive on Tez, which triggers the duplicate field 
error: Duplicate field fld_create_date.
   
   ### Expected behavior
   
   When reading the Hudi table in Hive/Tez, there should be no duplicate field 
error.
   The query or scan should return the data correctly with the fld_create_date 
field present only once.
   
   ### Environment Description
   
   * Hudi version: 1.0.2
   * Spark version:3.3.3
   * Flink version:1.14.5
   * Hive version:3.1.0
   * Hadoop version:3.1.0
   * Storage (HDFS/S3/GCS..):hdfs
   * Running on Docker? (yes/no): no
   
   
   ### Additional context
   
   _No response_
   
   ### Stacktrace
   
   ```shell
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to