Junning Liang created HUDI-7520:
-----------------------------------

             Summary: A rollback exception occurs when flink writes the mor 
table and is generating checkpoints after restarting the task.
                 Key: HUDI-7520
                 URL: https://issues.apache.org/jira/browse/HUDI-7520
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Junning Liang


When I restarted flink task that writes hudi mor table, the task was being 
submitted to deltacommit .

 
{code:java}
024-03-19 14:35:13,125 INFO  org.apache.hudi.client.BaseHoodieWriteClient       
          [] - Generate a new instant time: 20240319143513125 action: 
deltacommit
2024-03-19 14:35:13,125 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Creating a 
new instant [==>20240319143513125__deltacommit__REQUESTED]
2024-03-19 14:35:13,216 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Checking for 
file exists ?hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
2024-03-19 14:35:13,339 INFO  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline   [] - Create new 
file for toInstant 
?hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.inflight
2024-03-19 14:35:13,585 INFO  
org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Create 
instant [20240319143513125] for table [xxxx_hudi_di] with type [MERGE_ON_READ]
2024-03-19 14:35:13,585 INFO  
org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor 
executes action [commits the instant 20240319143212953] success!
2024-03-19 14:35:22,328 INFO  org.apache.hudi.hive.ddl.HMSDDLExecutor           
           [] - Time taken to getTableSchema: 53 ms
2024-03-19 14:35:22,329 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - No Schema difference for 
dwd_trip_dri_acq_slient_log_on_realtime_hudi_di
2024-03-19 14:35:22,329 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - Schema sync complete. Syncing partitions for xxxx_hudi_di
2024-03-19 14:35:22,383 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - Last commit time synced was found to be 20240319141413551
2024-03-19 14:35:22,383 INFO  org.apache.hudi.sync.common.HoodieSyncClient      
           [] - Last commit time synced is 20240319141413551, Getting commits 
since then
2024-03-19 14:35:48,362 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - Storage partitions scan complete. Found []
2024-03-19 14:35:48,499 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - Sync complete for 
dwd_trip_dri_acq_slient_log_on_realtime_hudi_di
2024-03-19 14:35:48,499 INFO  org.apache.hudi.hive.HiveSyncTool                 
           [] - Trying to sync hoodie table xxxx_hudi_di_ro with base path 
hdfs://xxxxx_hudi_di of type MERGE_ON_READ
2024-03-19 14:37:28,079 INFO  org.apache.hudi.common.table.TableSchemaResolver  
           [] - Failed to read operation field from avro schema (IOException 
when reading logblock from log file 
HoodieLogFile{pathStr='hdfs://xxxx_hudi_di/.00000004-9003-4302-9576-530ddaf9bc9b_20240315190136139.log.1_4-8-0',
 fileLen=-1})
2024-03-19 14:37:28,079 WARN  
org.apache.hudi.common.table.log.HoodieLogFileReader         [] - unable to 
close input stream for log file 
HoodieLogFile{pathStr='hdfs://xxxx_hudi_di/.00000004-9003-4302-9576-530ddaf9bc9b_20240315190136139.log.1_4-8-0',
 fileLen=-1}
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:896) 
~[flink-shaded-hadoop-2-uber-2.7.2-5530-10.7.jar:2.7.2-5530-10.7]
        at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:762) 
~[flink-shaded-hadoop-2-uber-2.7.2-5530-10.7.jar:2.7.2-5530-10.7]
        at java.io.BufferedInputStream.close(BufferedInputStream.java:483) 
~[?:1.8.0_312]
        at java.io.FilterInputStream.close(FilterInputStream.java:181) 
~[?:1.8.0_312]
        at java.io.FilterInputStream.close(FilterInputStream.java:181) 
~[?:1.8.0_312]
        at 
org.apache.hudi.common.table.log.HoodieLogFileReader.close(HoodieLogFileReader.java:360)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-ce65fe7132bd4b628bc782c637d8dd73:0.13.1-012]
        at 
org.apache.hudi.common.table.log.HoodieLogFileReader.lambda$addShutDownHook$0(HoodieLogFileReader.java:143)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-ce65fe7132bd4b628bc782c637d8dd73:0.13.1-012]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312] {code}
And the task restart completed, it occurred rollback exception and flink task  
always faillover.
{code:java}
024-03-19 16:16:33,583 ERROR 
org.apache.hudi.sink.StreamWriteOperatorCoordinator          [] - Executor 
executes action [initialize instant ] error
org.apache.hudi.exception.HoodieRollbackException: Failed to rollback 
hdfs://xxxx_hudi_di commits 20240319143513125
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:826)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:730)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:714)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:709)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:826)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:156)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:825)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.startInstant(StreamWriteOperatorCoordinator.java:415)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$6(StreamWriteOperatorCoordinator.java:442)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_312]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_312]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: org.apache.hudi.exception.HoodieRollbackException: Found commits 
after time :20240319143513125, please rollback greater commits first
        at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.validateRollbackCommitSequence(BaseRollbackActionExecutor.java:172)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.doRollbackAndGetStats(BaseRollbackActionExecutor.java:211)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:108)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:135)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.table.HoodieFlinkMergeOnReadTable.rollback(HoodieFlinkMergeOnReadTable.java:128)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        at 
org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:809)
 
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
        ... 12 more {code}
In the end, it can only be restored to normal by deleting the unfinished 
deltacommit and rollback files.
{code:java}
-rw-r--r--   3        9598 2024-03-19 14:35 
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit
-rw-r--r--   3           0 2024-03-19 14:32 
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit.inflight
-rw-r--r--   3           0 2024-03-19 14:32 
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit.requested
-rw-r--r--   3           0 2024-03-19 14:35 
hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.inflight
-rw-r--r--   3           0 2024-03-19 14:35 
hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
-rw-r--r--   3        1413 2024-03-19 14:41 
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback
-rw-r--r--   3           0 2024-03-19 14:41 
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback.inflight
-rw-r--r--   3        1229 2024-03-19 14:41 
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback.requested
-rw-r--r--   3        8070 2024-03-19 15:07 
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.commit
-rw-r--r--   3           0 2024-03-19 15:02 
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.compaction.inflight
-rw-r--r--   3        4758 2024-03-19 15:02 
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.compaction.requested
-rw-r--r--   3           0 2024-03-19 15:54 
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.inflight
-rw-r--r--   3        1230 2024-03-19 15:54 
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.requested
drwxr-xr-x   -           0 2023-11-01 14:15 hdfs://xxxx_hudi_di/.hoodie/archived
-rw-r--r--   3        1632 2024-03-15 18:04 
hdfs://xxxx_hudi_di/.hoodie/hoodie.properties {code}
{code:java}
hdfs dfs -rm hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
hdfs dfs -rm hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.
inflight
hdfs dfs -rm
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.inflight
hdfs dfs -rm
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.requested
{code}
 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to