Junning Liang created HUDI-7520:
-----------------------------------
Summary: A rollback exception occurs when flink writes the mor
table and is generating checkpoints after restarting the task.
Key: HUDI-7520
URL: https://issues.apache.org/jira/browse/HUDI-7520
Project: Apache Hudi
Issue Type: Bug
Reporter: Junning Liang
When I restarted flink task that writes hudi mor table, the task was being
submitted to deltacommit .
{code:java}
024-03-19 14:35:13,125 INFO org.apache.hudi.client.BaseHoodieWriteClient
[] - Generate a new instant time: 20240319143513125 action:
deltacommit
2024-03-19 14:35:13,125 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Creating a
new instant [==>20240319143513125__deltacommit__REQUESTED]
2024-03-19 14:35:13,216 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Checking for
file exists ?hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
2024-03-19 14:35:13,339 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline [] - Create new
file for toInstant
?hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.inflight
2024-03-19 14:35:13,585 INFO
org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Create
instant [20240319143513125] for table [xxxx_hudi_di] with type [MERGE_ON_READ]
2024-03-19 14:35:13,585 INFO
org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Executor
executes action [commits the instant 20240319143212953] success!
2024-03-19 14:35:22,328 INFO org.apache.hudi.hive.ddl.HMSDDLExecutor
[] - Time taken to getTableSchema: 53 ms
2024-03-19 14:35:22,329 INFO org.apache.hudi.hive.HiveSyncTool
[] - No Schema difference for
dwd_trip_dri_acq_slient_log_on_realtime_hudi_di
2024-03-19 14:35:22,329 INFO org.apache.hudi.hive.HiveSyncTool
[] - Schema sync complete. Syncing partitions for xxxx_hudi_di
2024-03-19 14:35:22,383 INFO org.apache.hudi.hive.HiveSyncTool
[] - Last commit time synced was found to be 20240319141413551
2024-03-19 14:35:22,383 INFO org.apache.hudi.sync.common.HoodieSyncClient
[] - Last commit time synced is 20240319141413551, Getting commits
since then
2024-03-19 14:35:48,362 INFO org.apache.hudi.hive.HiveSyncTool
[] - Storage partitions scan complete. Found []
2024-03-19 14:35:48,499 INFO org.apache.hudi.hive.HiveSyncTool
[] - Sync complete for
dwd_trip_dri_acq_slient_log_on_realtime_hudi_di
2024-03-19 14:35:48,499 INFO org.apache.hudi.hive.HiveSyncTool
[] - Trying to sync hoodie table xxxx_hudi_di_ro with base path
hdfs://xxxxx_hudi_di of type MERGE_ON_READ
2024-03-19 14:37:28,079 INFO org.apache.hudi.common.table.TableSchemaResolver
[] - Failed to read operation field from avro schema (IOException
when reading logblock from log file
HoodieLogFile{pathStr='hdfs://xxxx_hudi_di/.00000004-9003-4302-9576-530ddaf9bc9b_20240315190136139.log.1_4-8-0',
fileLen=-1})
2024-03-19 14:37:28,079 WARN
org.apache.hudi.common.table.log.HoodieLogFileReader [] - unable to
close input stream for log file
HoodieLogFile{pathStr='hdfs://xxxx_hudi_di/.00000004-9003-4302-9576-530ddaf9bc9b_20240315190136139.log.1_4-8-0',
fileLen=-1}
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:896)
~[flink-shaded-hadoop-2-uber-2.7.2-5530-10.7.jar:2.7.2-5530-10.7]
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:762)
~[flink-shaded-hadoop-2-uber-2.7.2-5530-10.7.jar:2.7.2-5530-10.7]
at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
~[?:1.8.0_312]
at java.io.FilterInputStream.close(FilterInputStream.java:181)
~[?:1.8.0_312]
at java.io.FilterInputStream.close(FilterInputStream.java:181)
~[?:1.8.0_312]
at
org.apache.hudi.common.table.log.HoodieLogFileReader.close(HoodieLogFileReader.java:360)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-ce65fe7132bd4b628bc782c637d8dd73:0.13.1-012]
at
org.apache.hudi.common.table.log.HoodieLogFileReader.lambda$addShutDownHook$0(HoodieLogFileReader.java:143)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-ce65fe7132bd4b628bc782c637d8dd73:0.13.1-012]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312] {code}
And the task restart completed, it occurred rollback exception and flink task
always faillover.
{code:java}
024-03-19 16:16:33,583 ERROR
org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Executor
executes action [initialize instant ] error
org.apache.hudi.exception.HoodieRollbackException: Failed to rollback
hdfs://xxxx_hudi_di commits 20240319143513125
at
org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:826)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:730)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:714)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:709)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:826)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:156)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:825)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.startInstant(StreamWriteOperatorCoordinator.java:415)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$6(StreamWriteOperatorCoordinator.java:442)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_312]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_312]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: org.apache.hudi.exception.HoodieRollbackException: Found commits
after time :20240319143513125, please rollback greater commits first
at
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.validateRollbackCommitSequence(BaseRollbackActionExecutor.java:172)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.doRollbackAndGetStats(BaseRollbackActionExecutor.java:211)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:108)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:135)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.table.HoodieFlinkMergeOnReadTable.rollback(HoodieFlinkMergeOnReadTable.java:128)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
at
org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:809)
~[blob_p-9c5a14ae562d04e25991a03bf9668559004f6a49-29eb5df5fcb250d5098a7564564fc47b:0.13.1-012]
... 12 more {code}
In the end, it can only be restored to normal by deleting the unfinished
deltacommit and rollback files.
{code:java}
-rw-r--r-- 3 9598 2024-03-19 14:35
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit
-rw-r--r-- 3 0 2024-03-19 14:32
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit.inflight
-rw-r--r-- 3 0 2024-03-19 14:32
hdfs://xxxx_hudi_di/.hoodie/20240319143212953.deltacommit.requested
-rw-r--r-- 3 0 2024-03-19 14:35
hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.inflight
-rw-r--r-- 3 0 2024-03-19 14:35
hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
-rw-r--r-- 3 1413 2024-03-19 14:41
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback
-rw-r--r-- 3 0 2024-03-19 14:41
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback.inflight
-rw-r--r-- 3 1229 2024-03-19 14:41
hdfs://xxxx_hudi_di/.hoodie/20240319144118843.rollback.requested
-rw-r--r-- 3 8070 2024-03-19 15:07
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.commit
-rw-r--r-- 3 0 2024-03-19 15:02
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.compaction.inflight
-rw-r--r-- 3 4758 2024-03-19 15:02
hdfs://xxxx_hudi_di/.hoodie/20240319150205132.compaction.requested
-rw-r--r-- 3 0 2024-03-19 15:54
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.inflight
-rw-r--r-- 3 1230 2024-03-19 15:54
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.requested
drwxr-xr-x - 0 2023-11-01 14:15 hdfs://xxxx_hudi_di/.hoodie/archived
-rw-r--r-- 3 1632 2024-03-15 18:04
hdfs://xxxx_hudi_di/.hoodie/hoodie.properties {code}
{code:java}
hdfs dfs -rm hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.requested
hdfs dfs -rm hdfs://xxxx_hudi_di/.hoodie/20240319143513125.deltacommit.
inflight
hdfs dfs -rm
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.inflight
hdfs dfs -rm
hdfs://xxxx_hudi_di/.hoodie/20240319155456269.rollback.requested
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)