Sephiroth1024 commented on issue #10442:
URL: https://github.com/apache/seatunnel/issues/10442#issuecomment-3841456223

   here is a stack
   ps: we have two master nodes, master1 (active) and master2, this is the 
master1
   
   java.util.concurrent.ExecutionException: 
com.hazelcast.core.OperationTimeoutException: GetOperation invocation failed to 
complete due to operation-heartbeat-timeout. Current time: 2026-02-03 
16:44:16.124. Start time: 2026-02-03 16:43:25.282. Total elapsed time: 50842 
ms. Last operation heartbeat: never. Last operation heartbeat from member: 
2026-02-03 16:44:15.671. 
Invocation{op=com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService',
 identityHash=1942368721, partitionId=122, replicaIndex=0, callId=-402091963, 
invocationTime=1770108205275 (2026-02-03 16:43:25.275), waitTimeout=-1, 
callTimeout=25000, 
tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0, 
name=engine_ownedSlotProfilesIMap}, tryCount=10, tryPauseMillis=500, 
invokeCount=1, callTimeoutMillis=25000, firstInvocationTimeMs=1770108205282, 
firstInvocationTime='2026-02-03 16:43:25.282', lastHeartbeatMillis=0, 
lastHeartbeatTime='1970-01-01 08:00:00.000', target=[my-seatunnel-master2.xxx
 .cn]:5801, pendingResponse={VOID}, backupsAcksExpected=-1, 
backupsAcksReceived=0, connection=Connection[id=62, 
/xx.xx.xx.xx:5801->/yy.yy.yy.yy:27039, qualifier=null, 
endpoint=[my-seatunnel-master2.xxx.cn]:5801, 
remoteUuid=48b70b42-3eba-42f5-afe0-abac2cb877bd, alive=true, 
connectionType=MEMBER, planeIndex=0]}
   at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
   at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
   at 
org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture.get(CompletableFuture.java:147)
   at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$startTriggerPendingCheckpoint$10(CheckpointCoordinator.java:659)
   at 
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
   at 
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
   at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
   at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
   at 
org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture.lambda$new$0(CompletableFuture.java:66)
   at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
   at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
   at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
   at 
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
   at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
   at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
   at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   Caused by: com.hazelcast.core.OperationTimeoutException: GetOperation 
invocation failed to complete due to operation-heartbeat-timeout. Current time: 
2026-02-03 16:44:16.124. Start time: 2026-02-03 16:43:25.282. Total elapsed 
time: 50842 ms. Last operation heartbeat: never. Last operation heartbeat from 
member: 2026-02-03 16:44:15.671. 
Invocation{op=com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService',
 identityHash=1942368721, partitionId=122, replicaIndex=0, callId=-402091963, 
invocationTime=1770108205275 (2026-02-03 16:43:25.275), waitTimeout=-1, 
callTimeout=25000, 
tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0, 
name=engine_ownedSlotProfilesIMap}, tryCount=10, tryPauseMillis=500, 
invokeCount=1, callTimeoutMillis=25000, firstInvocationTimeMs=1770108205282, 
firstInvocationTime='2026-02-03 16:43:25.282', lastHeartbeatMillis=0, 
lastHeartbeatTime='1970-01-01 08:00:00.000', 
target=[my-seatunnel-master2.xxx.cn]:5801, pendingResponse={VO
 ID}, backupsAcksExpected=-1, backupsAcksReceived=0, 
connection=Connection[id=62, /xx.xx.xx.xx:5801->/yy.yy.yy.yy:27039, 
qualifier=null, endpoint=[my-seatunnel-master2.xxx.cn]:5801, 
remoteUuid=48b70b42-3eba-42f5-afe0-abac2cb877bd, alive=true, 
connectionType=MEMBER, planeIndex=0]}
   at 
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:194)
   at 
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:136)
   at 
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:99)
   at 
com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617)
   at 
com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:479)
   at 
com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:371)
   at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:123)
   at 
org.apache.seatunnel.engine.server.master.JobMaster.queryTaskGroupAddress(JobMaster.java:792)
   at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.sendOperationToMemberNode(CheckpointManager.java:316)
   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
   at 
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
   at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
   at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.triggerCheckpoint(CheckpointCoordinator.java:797)
   at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
   at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
   at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
   at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
   at ------ submitted from ------.()
   at 
com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:336)
   at 
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:112)
   at 
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:100)
   at 
com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617)
   at 
com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:479)
   at 
com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:371)
   at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:123)
   at 
org.apache.seatunnel.engine.server.master.JobMaster.queryTaskGroupAddress(JobMaster.java:792)
   at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.sendOperationToMemberNode(CheckpointManager.java:316)
   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
   at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
   at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
   at 
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
   at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
   at 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.triggerCheckpoint(CheckpointCoordinator.java:797)
   at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
   at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
   ... 5 more
   
   when i run the arthas command `watch 
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator 
scheduleTriggerPendingCheckpoint '{target}' 
'target.jobId==2013177567893491713L' -x 2`, it shows as below (`pendingCounter` 
is 1)
   
   <img width="2876" height="1578" alt="Image" 
src="https://github.com/user-attachments/assets/0aba0f0a-3721-4b45-9e17-50f455c74bed";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to