weipengfei-sj opened a new issue, #6898:
URL: https://github.com/apache/seatunnel/issues/6898

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   1. 触发savepoint操作
   
    ./bin/seatunnel.sh -s 846231481092145153
   2024-05-24 14:53:46,956 INFO  [.c.i.s.ClientInvocationService] [main] - 
hz.client_1 [seatunnel] [5.1] Running with 2 response threads, dynamic=true
   2024-05-24 14:53:47,018 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
STARTING
   2024-05-24 14:53:47,019 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
STARTED
   2024-05-24 14:53:47,040 INFO  [.c.i.c.ClientConnectionManager] [main] - 
hz.client_1 [seatunnel] [5.1] Trying to connect to cluster: seatunnel
   2024-05-24 14:53:47,043 INFO  [.c.i.c.ClientConnectionManager] [main] - 
hz.client_1 [seatunnel] [5.1] Trying to connect to [localhost]:5801
   2024-05-24 14:53:47,072 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
CLIENT_CONNECTED
   2024-05-24 14:53:47,073 INFO  [.c.i.c.ClientConnectionManager] [main] - 
hz.client_1 [seatunnel] [5.1] Authenticated with server 
[20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, server version: 5.1, 
local address: /127.0.0.1:52494
   2024-05-24 14:53:47,074 INFO  [c.h.i.d.Diagnostics           ] [main] - 
hz.client_1 [seatunnel] [5.1] Diagnostics disabled. To enable add 
-Dhazelcast.diagnostics.enabled=true to the JVM arguments.
   2024-05-24 14:53:47,082 INFO  [c.h.c.i.s.ClientClusterService] 
[hz.client_1.event-2] - hz.client_1 [seatunnel] [5.1] 
   
   Members [1] {
           Member [20.200.176.31]:5801 - f60a94f5-bcec-4e8f-a403-070293dfc28e
   }
   
   2024-05-24 14:53:47,105 INFO  [.c.i.s.ClientStatisticsService] [main] - 
Client statistics is enabled with period 5 seconds.
   2024-05-24 14:53:51,325 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
SHUTTING_DOWN
   2024-05-24 14:53:51,328 INFO  [.c.i.c.ClientConnectionManager] [main] - 
hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: 
[20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, connection: 
ClientConnection{alive=false, connectionId=1, 
channel=NioChannel{/127.0.0.1:52494->localhost/127.0.0.1:5801}, 
remoteAddress=[20.200.176.31]:5801, lastReadTime=2024-05-24 14:53:51.324, 
lastWriteTime=2024-05-24 14:53:47.233, closedTime=2024-05-24 14:53:51.326, 
connected server version=5.1}
   2024-05-24 14:53:51,328 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
CLIENT_DISCONNECTED
   2024-05-24 14:53:51,330 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
SHUTDOWN
   2024-05-24 14:53:51,330 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
Closed SeaTunnel client......
   
   
   2. 状态查询
   
   2024-05-24 14:55:18,096 INFO  [.c.i.s.ClientStatisticsService] [main] - 
Client statistics is enabled with period 5 seconds.
   Job ID              Job Name       Job Status      Submit Time              
Finished Time            
   ------------------  -------------  --------------  -----------------------  
-----------------------  
   846231481092145153  SeaTunnel_Job  SAVEPOINT_DONE  2024-05-24 13:55:38.84   
2024-05-24 14:54:41.818
   
   3. 提交任务的客户端日志,查看正常保存,被终止
   
   2024-05-24 14:54:41,078 INFO  [o.a.s.e.c.j.JobMetricsRunner  ] 
[job-metrics-runner-846231481092145153] - 
   ***********************************************
              Job Progress Information
   ***********************************************
   Job Id                    :  846231481092145153
   Read Count So Far         :                 250
   Write Count So Far        :                 250
   Average Read Count        :                 0/s
   Average Write Count       :                 0/s
   Last Statistic Time       : 2024-05-24 14:53:41
   Current Statistic Time    : 2024-05-24 14:54:41
   ***********************************************
   
   2024-05-24 14:54:42,169 INFO  [o.a.s.e.c.j.ClientJobProxy    ] [main] - Job 
(846231481092145153) end with state SAVEPOINT_DONE
   2024-05-24 14:54:42,170 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
   ***********************************************
              Job Statistic Information
   ***********************************************
   Start Time                : 2024-05-24 13:55:38
   End Time                  : 2024-05-24 14:54:42
   Total Time(s)             :                3543
   Total Read Count          :                 250
   Total Write Count         :                 250
   Total Failed Count        :                   0
   ***********************************************
   
   2024-05-24 14:54:42,171 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
SHUTTING_DOWN
   2024-05-24 14:54:42,176 INFO  [.c.i.c.ClientConnectionManager] [main] - 
hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: 
[20.200.176.31]:5801:f60a94f5-bcec-4e8f-a403-070293dfc28e, connection: 
ClientConnection{alive=false, connectionId=1, 
channel=NioChannel{/127.0.0.1:47888->localhost/127.0.0.1:5801}, 
remoteAddress=[20.200.176.31]:5801, lastReadTime=2024-05-24 14:54:42.170, 
lastWriteTime=2024-05-24 14:54:42.169, closedTime=2024-05-24 14:54:42.174, 
connected server version=5.1}
   2024-05-24 14:54:42,176 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
CLIENT_DISCONNECTED
   2024-05-24 14:54:42,178 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
SHUTDOWN
   2024-05-24 14:54:42,179 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
Closed SeaTunnel client......
   2024-05-24 14:54:42,179 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
Closed metrics executor service ......
   2024-05-24 14:54:42,180 INFO  [s.c.s.s.c.ClientExecuteCommand] 
[ForkJoinPool.commonPool-worker-11] - run shutdown hook because get close signal
   
   
   4. 通过savepoint重启任务
   ./bin/seatunnel.sh --config ./config/test-source-kerberos-kafka.yaml  -r 
846231481092145153
   任务报错如下:
   2024-05-24 14:56:01,286 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
CLIENT_DISCONNECTED
   2024-05-24 14:56:01,288 INFO  [c.h.c.LifecycleService        ] [main] - 
hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is 
SHUTDOWN
   2024-05-24 14:56:01,289 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
Closed SeaTunnel client......
   2024-05-24 14:56:01,289 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
Closed metrics executor service ......
   2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
   
   
===============================================================================
   
   
   2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
Fatal Error, 
   
   2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
Please submit bug report in https://github.com/apache/seatunnel/issues
   
   2024-05-24 14:56:01,289 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
Reason:SeaTunnel job executed failed 
   
   2024-05-24 14:56:01,290 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
Exception 
StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: 
SeaTunnel job executed failed
           at 
org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
           at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
           at 
org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
   Caused by: 
org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: 
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: 
CheckpointCoordinator inside have error.
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:270)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183)
           at 
org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48)
           at 
com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
           at 
com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
   Caused by: org.apache.seatunnel.common.utils.SeaTunnelException: 
java.lang.NullPointerException
   
           ... 11 more
   
           at 
org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
           ... 2 more
    
   2024-05-24 14:56:01,290 ERROR [o.a.s.c.s.SeaTunnel           ] [main] - 
   
===============================================================================
   
   
   
   Exception in thread "main" 
org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel 
job executed failed
           at 
org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202)
           at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
           at 
org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
   Caused by: 
org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: 
org.apache.seatunnel.engine.server.checkpoint.CheckpointException: 
CheckpointCoordinator inside have error.
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:270)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376)
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183)
           at 
org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48)
           at 
com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248)
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139)
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
           at 
com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
   Caused by: org.apache.seatunnel.common.utils.SeaTunnelException: 
java.lang.NullPointerException
   
           ... 11 more
   
           at 
org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194)
           ... 2 more
   2024-05-24 14:56:01,292 INFO  [s.c.s.s.c.ClientExecuteCommand] 
[ForkJoinPool.commonPool-worker-18] - run shutdown hook because get close signal
   
   
   
   5. 查看集群日志如下:
   2024-05-24 15:07:02,023 INFO  [.s.t.SourceSplitEnumeratorTask] 
[hz.main.seaTunnel.task.thread-77] - received reader register, readerID: 
TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=846231481092145153, 
pipelineId=1, taskGroupId=30001}, taskID=40001, index=1}
   2024-05-24 15:07:02,025 ERROR [.s.e.s.c.CheckpointCoordinator] 
[hz.main.generic-operation.thread-9] - report error from task
   org.apache.seatunnel.common.utils.SeaTunnelException: 
java.lang.NullPointerException
   
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.reportCheckpointErrorFromTask(CheckpointCoordinator.java:376)
 ~[seatunnel-starter.jar:2.3.5]
           at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.reportCheckpointErrorFromTask(CheckpointManager.java:183)
 ~[seatunnel-starter.jar:2.3.5]
           at 
org.apache.seatunnel.engine.server.checkpoint.operation.CheckpointErrorReportOperation.run(CheckpointErrorReportOperation.java:48)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationservice.Operation.call(Operation.java:189) 
~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.call(OperationRunnerImpl.java:273)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:248)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:213)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:175)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:139)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.executeRun(OperationThread.java:123)
 ~[seatunnel-starter.jar:2.3.5]
           at 
com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
 ~[seatunnel-starter.jar:2.3.5]
   2024-05-24 15:07:02,026 INFO  [.s.e.s.c.CheckpointCoordinator] 
[hz.main.generic-operation.thread-9] - start clean pending checkpoint cause 
CheckpointCoordinator inside have error.
   2024-05-24 15:07:02,027 INFO  [.s.e.s.c.CheckpointCoordinator] 
[hz.main.generic-operation.thread-9] - Turn 
checkpoint_state_846231481092145153_1 state from RUNNING to FAILED
   
   
   查看任务状态是失败的
   Job ID              Job Name       Job Status  Submit Time              
Finished Time            
   ------------------  -------------  ----------  -----------------------  
-----------------------  
   846231481092145153  SeaTunnel_Job  FAILED      2024-05-24 15:09:25.644  
2024-05-24 15:09:36.196
   
   ### SeaTunnel Version
   
   2.3.5
   
   ### SeaTunnel Config
   
   ```conf
   env {
     # You can set SeaTunnel environment configuration here
     parallelism = 2
     job.mode = "STREAMING"
     checkpoint.interval = 2000
   }
   
   source {
     # This is a example source plugin **only for test and demonstrate the 
feature source plugin**
     FakeSource {
       parallelism = 2
       result_table_name = "fake"
       row.num = 16
       schema = {
         fields {
           name = "string"
           age = "int"
         }
       }
     }
   
     # If you would like to get more information about how to configure 
SeaTunnel and see full list of source plugins,
     # please go to https://seatunnel.apache.org/docs/category/source-v2
   }
   
   sink {
     Console {
     }
   
     # If you would like to get more information about how to configure 
SeaTunnel and see full list of sink plugins,
     # please go to https://seatunnel.apache.org/docs/category/sink-v2
   }
   ```
   
   
   ### Running Command
   
   ```shell
   触发savepoint
   ./bin/seatunnel.sh -s 846231481092145153
   
   任务恢复
    ./bin/seatunnel.sh --config ./config/v2.streaming.conf.template  -r 
846231481092145153
   ```
   
   
   ### Error Exception
   
   ```log
   异常信息查看最后部分
   ```
   
   
   ### Zeta or Flink or Spark Version
   
   zeta
   
   ### Java or Scala Version
   
   java1.8
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to