voonhous commented on issue #8892:
URL: https://github.com/apache/hudi/issues/8892#issuecomment-1643039551

   @big-doudou Apologies for the late reply. I was trying to reproduce this 
issue on our end, but was unable to do so. 
   
   A little context on what we did:
   
   Using a datagen source, we'll sink the data into a hudi table. Before a 
checkpoint, we'll kill one of the TM's task. Upon doing so, a rollback will be 
triggered when all the TMs restart. I checked with a colleague of mine and they 
mentioned that when hudi is uperforming an upsert, there's a shuffle operation. 
The presence of a shuffle operation will trigger a "global failover".
   
   Here's the Flink-SQL that i used while attempting to reproduce your issue.
   
   ```sql
   CREATE TEMPORARY TABLE buyer_info (
       id bigint, 
       dec_col decimal(25, 10),
       country string,
       age INT,
       update_time STRING
   ) WITH (
       'connector' = 'datagen',
       'rows-per-second' = '10',
       'fields.age.min' = '0',
       'fields.age.max' = '7',
       'fields.country.length' = '1'
   );
   
   -- Hudi table to write to
   CREATE TEMPORARY TABLE dim_buyer_info_test
   (
       id bigint,
       dec_col decimal(25, 10),
       country string,
       age INT,
       update_time STRING
   ) PARTITIONED BY (age)
   WITH
   (
       -- Hudi settings
       'connector' = 'hudi',
       'hoodie.datasource.write.recordkey.field' = 'id',
       'path' = '/path/to/hudi_table/duplicate_file_id_issue',
       'write.operation' = 'UPSERT',
       'table.type' = 'MERGE_ON_READ',
       'hoodie.compaction.payload.class' = 
'org.apache.hudi.common.model.PartialUpdateAvroPayload',
       'hoodie.datasource.write.payload.class' = 
'org.apache.hudi.common.model.PartialUpdateAvroPayload',
       'hoodie.table.keygenerator.class' = 
'org.apache.hudi.keygen.ComplexAvroKeyGenerator',
       'write.precombine.field' = 'update_time',
       'index.type' = 'BUCKET',
       'hoodie.bucket.index.num.buckets' = '4',
       'write.tasks' = '8',
       'hoodie.bucket.index.hash.field' = 'id',
       'clean.retain_commits' = '5',
       -- Hive sync settings
       'hive_sync.enable' = 'false'
   );
   
   -- Insert into Hudi sink
   INSERT INTO dim_buyer_info_test
   SELECT id, dec_col, country, age, update_time
   FROM buyer_info;
   ```
   
   Might have butchered the explanation above...
   
   As such, we were unable to reproduce your issue where of a single TM 
restarting. 
   
   Can you please share your job configurations and how you're doing your tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to