lm520hy opened a new issue, #8781: URL: https://github.com/apache/seatunnel/issues/8781
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information spark写入数据后改变了hudi的元数据源信息 seatunnel同步数据报错读取hudi的元数据信息报错了 ### SeaTunnel Version 2.3.8 ### SeaTunnel Config ```conf env { parallelism = 1 job.mode = "BATCH" } source { FakeSource { parallelism = 1 result_table_name = "fake2" row.num = 16 schema = { fields { id = "int" name = "string" price = "double" ts = "bigint" } } rows = [ { kind = INSERT fields = [ 7,"l", 1100,117] } ] } } sink { Hudi { table_dfs_path = "hdfs:///hudi/" table_name = "hudi_mor_tbl2" table_type = "COPY_ON_WRITE" conf_files_path = "/soft/hadoop/etc/hadoop/hdfs-site.xml;/soft/hadoop/etc/hadoop/core-site.xml;/soft/hadoop/etc/hadoop/yarn-site.xml" batch_size = 10000 } ``` ### Running Command ```shell env { parallelism = 1 job.mode = "BATCH" } source { FakeSource { parallelism = 1 result_table_name = "fake2" row.num = 16 schema = { fields { id = "int" name = "string" price = "double" ts = "bigint" } } rows = [ { kind = INSERT fields = [ 7,"l", 1100,117] } ] } } sink { Hudi { table_dfs_path = "hdfs:///hudi/" table_name = "hudi_mor_tbl2" table_type = "COPY_ON_WRITE" conf_files_path = "/soft/hadoop/etc/hadoop/hdfs-site.xml;/soft/hadoop/etc/hadoop/core-site.xml;/soft/hadoop/etc/hadoop/yarn-site.xml" batch_size = 10000 } ``` ### Error Exception ```log 2025-02-20 19:27:42,144 INFO [a.h.c.t.t.HoodieActiveTimeline] [st-multi-table-sink-writer-2] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]} 2025-02-20 19:27:42,145 INFO [o.a.h.c.t.HoodieTableConfig ] [st-multi-table-sink-writer-2] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties 2025-02-20 19:27:42,147 WARN [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@1f445812 java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693) [seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018) [seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39) [seatunnel-starter.jar:2.3.8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_381] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_381] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_381] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_381] at java.lang.Thread.run(Thread.java:750) [?:1.8.0_381] Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8] ... 17 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_381] at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_381] at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256) ~[seatunnel-starter.jar:2.3.8] at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8] ... 17 more Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:520) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8] ... 6 more Caused by: java.lang.UnsupportedOperationException at org.apache.hudi.metadata.FileSystemBackedTableMetadata.getLatestCompactionTime(FileSystemBackedTableMetadata.java:280) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:510) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8] at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8] ... 6 more 2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 70000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} 2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 70000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}. 2025-02-20 19:27:42,152 WARN [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Interrupted task 60000 - org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask@4fe9a7d5 2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 60000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} 2025-02-20 19:27:42,154 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2 2025-02-20 19:27:42,155 INFO [o.a.h.c.t.HoodieTableConfig ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties 2025-02-20 19:27:42,158 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs:///hudi//default/hudi_mor_tbl2 2025-02-20 19:27:42,158 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading Active commit timeline for hdfs:///hudi//default/hudi_mor_tbl2 2025-02-20 19:27:42,159 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with FAILED 2025-02-20 19:27:42,160 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 60000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}. 2025-02-20 19:27:42,160 INFO [o.a.s.e.s.TaskExecutionService] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Task TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with state FAILED 2025-02-20 19:27:42,160 INFO [a.h.c.t.t.HoodieActiveTimeline] [ForkJoinPool.commonPool-worker-1] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]} 2025-02-20 19:27:42,160 INFO [o.a.h.c.u.CleanerUtils ] [ForkJoinPool.commonPool-worker-1] - Cleaned failed attempts if any 2025-02-20 19:27:42,160 INFO [o.a.s.e.s.CoordinatorService ] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Received task end from execution TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}, state FAILED 2025-02-20 19:27:42,161 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2 2025-02-20 19:27:42,161 INFO [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-36] - log event: ReaderCloseEvent(createdTime=1740050862160, jobId=944918221583024129, eventType=LIFECYCLE_READER_CLOSE) 2025-02-20 19:27:42,162 INFO [o.a.h.c.t.HoodieTableConfig ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties 2025-02-20 19:27:42,162 INFO [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] turned from state RUNNING to FAILED. 2025-02-20 19:27:42,162 INFO [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] state process is stopped 2025-02-20 19:27:42,162 ERROR [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] end with state FAILED and Exception: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253) at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66) at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39) at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27) at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70) at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50) at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51) at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73) at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168) at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78) at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693) at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018) at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258) at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ... 17 more Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256) ... 18 more Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table ``` ### Zeta or Flink or Spark Version _No response_ ### Java or Scala Version _No response_ ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
