big-doudou commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1649178251
> @big-doudou Apologies for the late reply. I was trying to reproduce this issue on our end, but was unable to do so. > > A little context on what we did: > > Using a datagen source, we'll sink the data into a hudi table. Before a checkpoint, we'll kill one of the TM's task. Upon doing so, a rollback will be triggered when all the TMs restart. I checked with a colleague of mine and they mentioned that when hudi is uperforming an upsert, there's a shuffle operation. The presence of a shuffle operation will trigger a "global failover". > > Here's the Flink-SQL that i used while attempting to reproduce your issue. > > ```sql > CREATE TEMPORARY TABLE buyer_info ( > id bigint, > dec_col decimal(25, 10), > country string, > age INT, > update_time STRING > ) WITH ( > 'connector' = 'datagen', > 'rows-per-second' = '10', > 'fields.age.min' = '0', > 'fields.age.max' = '7', > 'fields.country.length' = '1' > ); > > -- Hudi table to write to > CREATE TEMPORARY TABLE dim_buyer_info_test > ( > id bigint, > dec_col decimal(25, 10), > country string, > age INT, > update_time STRING > ) PARTITIONED BY (age) > WITH > ( > -- Hudi settings > 'connector' = 'hudi', > 'hoodie.datasource.write.recordkey.field' = 'id', > 'path' = '/path/to/hudi_table/duplicate_file_id_issue', > 'write.operation' = 'UPSERT', > 'table.type' = 'MERGE_ON_READ', > 'hoodie.compaction.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload', > 'hoodie.datasource.write.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload', > 'hoodie.table.keygenerator.class' = 'org.apache.hudi.keygen.ComplexAvroKeyGenerator', > 'write.precombine.field' = 'update_time', > 'index.type' = 'BUCKET', > 'hoodie.bucket.index.num.buckets' = '4', > 'write.tasks' = '8', > 'hoodie.bucket.index.hash.field' = 'id', > 'clean.retain_commits' = '5', > -- Hive sync settings > 'hive_sync.enable' = 'false' > ); > > -- Insert into Hudi sink > INSERT INTO dim_buyer_info_test > SELECT id, dec_col, country, age, update_time > FROM buyer_info; > ``` > > Might have butchered the explanation above... > > As such, we were unable to reproduce your issue where of a single TM restarting. > > Can you please share your job configurations and how you're doing your tests? Sorry, didn't see it in time My flink job runs on k8s, before checkpoint, after some log files are generated, kill the container -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
