big-doudou commented on issue #8892: URL: https://github.com/apache/hudi/issues/8892#issuecomment-1632159235
The following steps can reproduce this error 1. Start the Flink task 2. Before the detail commit, some log files are written to the disk. kill TM 3. After waiting for a detail commit to complete, cancel the task and restart it again, exception: duplicate file id TimeLine before kill: 2023-07-12 16:42 viewfs:///.hoodie/20230712164243784.deltacommit.inflight 2023-07-12 16:42 viewfs:///.hoodie/20230712164243784.deltacommit.requested Files before kill: 2023-07-12 16:45 viewfs:///.00000255-add0-4f0d-b367-f1f7954c7717_20230712164243784.log.1_5-64-0 TimeLine after reboot: 2023-07-12 16:58 viewfs:///.hoodie/20230712164243784.deltacommit 2023-07-12 16:42 viewfs:///.hoodie/20230712164243784.deltacommit.inflight 2023-07-12 16:42 viewfs:///.hoodie/20230712164243784.deltacommit.requested File after kill: 2023-07-12 16:50 viewfs:///.00000255-00fd-4f23-9373-d85e12686dd3_20230712164243784.log.1_5-64-1 2023-07-12 16:45 viewfs:///.00000255-add0-4f0d-b367-f1f7954c7717_20230712164243784.log.1_5-64-0 You can see that instant is reused, and the file is duplicated I locate the error in this place. If the fault is automatically restored, the bootstrap event will not be sent, so the program will not rollback the old instant, which will cause the files temporarily written by instant 20230712164243784 to not be cleaned up. https://github.com/danny0405/hudi/blob/50712dceb582c0ebbce263dec4413c11b2e92ddd/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/common/AbstractStreamWriteFunction.java#L216C1-L216C1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
