weipengfei-sj opened a new issue, #6898: URL: https://github.com/apache/seatunnel/issues/6898
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened 1. 触发savepoint操作 ./bin/seatunnel.sh -s 846231481092145153 2024-05-24 14:53:46,956 INFO [.c.i.s.ClientInvocationService] [main] - hz.client_1 [seatunnel] [5.1] Running with 2 response threads, dynamic=true 2024-05-24 14:53:47,018 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTING 2024-05-24 14:53:47,019 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTED 2024-05-24 14:53:47,040 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel] [5.1] Trying to connect to cluster: seatunnel 2024-05-24 14:53:47,043 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel] [5.1] Trying to connect to [localhost]:5801 2024-05-24 14:53:47,072 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_CONNECTED 2024-05-24 14:53:47,073 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel] [5.1] Authenticated with server [20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, server version: 5.1, local address: /127.0.0.1:52494 2024-05-24 14:53:47,074 INFO [c.h.i.d.Diagnostics ] [main] - hz.client_1 [seatunnel] [5.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments. 2024-05-24 14:53:47,082 INFO [c.h.c.i.s.ClientClusterService] [hz.client_1.event-2] - hz.client_1 [seatunnel] [5.1] Members [1] { Member [20.200.176.31]:5801 - f60a94f5-bcec-4e8f-a403-070293dfc28e } 2024-05-24 14:53:47,105 INFO [.c.i.s.ClientStatisticsService] [main] - Client statistics is enabled with period 5 seconds. 2024-05-24 14:53:51,325 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN 2024-05-24 14:53:51,328 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:52494->localhost/127.0.0.1:5801}, remoteAddress=[20.200.176.31]:5801, lastReadTime=2024-05-24 14:53:51.324, lastWriteTime=2024-05-24 14:53:47.233, closedTime=2024-05-24 14:53:51.326, connected server version=5.1} 2024-05-24 14:53:51,328 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED 2024-05-24 14:53:51,330 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN 2024-05-24 14:53:51,330 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client...... 2. 状态查询 2024-05-24 14:55:18,096 INFO [.c.i.s.ClientStatisticsService] [main] - Client statistics is enabled with period 5 seconds. Job ID Job Name Job Status Submit Time Finished Time ------------------ ------------- -------------- ----------------------- ----------------------- 846231481092145153 SeaTunnel_Job SAVEPOINT_DONE 2024-05-24 13:55:38.84 2024-05-24 14:54:41.818 3. 提交任务的客户端日志,查看正常保存,被终止 2024-05-24 14:54:41,078 INFO [o.a.s.e.c.j.JobMetricsRunner ] [job-metrics-runner-846231481092145153] - *********************************************** Job Progress Information *********************************************** Job Id : 846231481092145153 Read Count So Far : 250 Write Count So Far : 250 Average Read Count : 0/s Average Write Count : 0/s Last Statistic Time : 2024-05-24 14:53:41 Current Statistic Time : 2024-05-24 14:54:41 *********************************************** 2024-05-24 14:54:42,169 INFO [o.a.s.e.c.j.ClientJobProxy ] [main] - Job (846231481092145153) end with state SAVEPOINT_DONE 2024-05-24 14:54:42,170 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - *********************************************** Job Statistic Information *********************************************** Start Time : 2024-05-24 13:55:38 End Time : 2024-05-24 14:54:42 Total Time(s) : 3543 Total Read Count : 250 Total Write Count : 250 Total Failed Count : 0 *********************************************** 2024-05-24 14:54:42,171 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN 2024-05-24 14:54:42,176 INFO [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:47888->localhost/127.0.0.1:5801}, remoteAddress=[20.200.176.31]:5801, lastReadTime=2024-05-24 14:54:42.170, lastWriteTime=2024-05-24 14:54:42.169, closedTime=2024-05-24 14:54:42.174, connected server version=5.1} 2024-05-24 14:54:42,176 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED 2024-05-24 14:54:42,178 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN 2024-05-24 14:54:42,179 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client...... 2024-05-24 14:54:42,179 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ...... 2024-05-24 14:54:42,180 INFO [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-11] - run shutdown hook because get close signal 4. 通过savepoint重启任务 ./bin/seatunnel.sh --config ./config/test-source-kerberos-kafka.yaml -r 846231481092145153 任务报错如下: 2024-05-24 14:56:01,286 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED 2024-05-24 14:56:01,288 INFO [c.h.c.LifecycleService ] [main] - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN 2024-05-24 14:56:01,289 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client...... 2024-05-24 14:56:01,289 INFO [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ...... 2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel ] [main] - =============================================================================== 2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel ] [main] - Fatal Error, 2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel ] [main] - Please submit bug report in https://github.com/apache/seatunnel/issues 2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel ] [main] - Reason:SeaTunnel job executed failed 2024-05-24 14:56:01,290 ERROR [o.a.s.c.s.SeaTunnel ] [main] - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: CheckpointCoordinator inside have error. at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:270) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376) at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183) at org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48) at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) Caused by: org.apache.seatunnel.common.utils.SeaTunnelException: java.lang.NullPointerException ... 11 more at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194) ... 2 more 2024-05-24 14:56:01,290 ERROR [o.a.s.c.s.SeaTunnel ] [main] - =============================================================================== Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: CheckpointCoordinator inside have error. at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:270) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376) at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183) at org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48) at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248) at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139) at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123) at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) Caused by: org.apache.seatunnel.common.utils.SeaTunnelException: java.lang.NullPointerException ... 11 more at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194) ... 2 more 2024-05-24 14:56:01,292 INFO [s.c.s.s.c.ClientExecuteCommand] [ForkJoinPool.commonPool-worker-18] - run shutdown hook because get close signal 5. 查看集群日志如下: 2024-05-24 15:07:02,023 INFO [.s.t.SourceSplitEnumeratorTask] [hz.main.seaTunnel.task.thread-77] - received reader register, readerID: TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=846231481092145153, pipelineId=1, taskGroupId=30001}, taskID=40001, index=1} 2024-05-24 15:07:02,025 ERROR [.s.e.s.c.CheckpointCoordinator] [hz.main.generic-operation.thread-9] - report error from task org.apache.seatunnel.common.utils.SeaTunnelException: java.lang.NullPointerException at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376) ~[seatunnel-starter.jar:2.3.5] at org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183) ~[seatunnel-starter.jar:2.3.5] at org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123) ~[seatunnel-starter.jar:2.3.5] at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102) ~[seatunnel-starter.jar:2.3.5] 2024-05-24 15:07:02,026 INFO [.s.e.s.c.CheckpointCoordinator] [hz.main.generic-operation.thread-9] - start clean pending checkpoint cause CheckpointCoordinator inside have error. 2024-05-24 15:07:02,027 INFO [.s.e.s.c.CheckpointCoordinator] [hz.main.generic-operation.thread-9] - Turn checkpoint_state_846231481092145153_1 state from RUNNING to FAILED 查看任务状态是失败的 Job ID Job Name Job Status Submit Time Finished Time ------------------ ------------- ---------- ----------------------- ----------------------- 846231481092145153 SeaTunnel_Job FAILED 2024-05-24 15:09:25.644 2024-05-24 15:09:36.196 ### SeaTunnel Version 2.3.5 ### SeaTunnel Config ```conf env { # You can set SeaTunnel environment configuration here parallelism = 2 job.mode = "STREAMING" checkpoint.interval = 2000 } source { # This is a example source plugin **only for test and demonstrate the feature source plugin** FakeSource { parallelism = 2 result_table_name = "fake" row.num = 16 schema = { fields { name = "string" age = "int" } } } # If you would like to get more information about how to configure SeaTunnel and see full list of source plugins, # please go to https://seatunnel.apache.org/docs/category/source-v2 } sink { Console { } # If you would like to get more information about how to configure SeaTunnel and see full list of sink plugins, # please go to https://seatunnel.apache.org/docs/category/sink-v2 } ``` ### Running Command ```shell 触发savepoint ./bin/seatunnel.sh -s 846231481092145153 任务恢复 ./bin/seatunnel.sh --config ./config/v2.streaming.conf.template -r 846231481092145153 ``` ### Error Exception ```log 异常信息查看最后部分 ``` ### Zeta or Flink or Spark Version zeta ### Java or Scala Version java1.8 ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
