Sephiroth1024 commented on issue #10442:
URL: https://github.com/apache/seatunnel/issues/10442#issuecomment-3841456223
here is a stack
ps: we have two master nodes, master1 (active) and master2, this is the
master1
java.util.concurrent.ExecutionException:
com.hazelcast.core.OperationTimeoutException: GetOperation invocation failed to
complete due to operation-heartbeat-timeout. Current time: 2026-02-03
16:44:16.124. Start time: 2026-02-03 16:43:25.282. Total elapsed time: 50842
ms. Last operation heartbeat: never. Last operation heartbeat from member:
2026-02-03 16:44:15.671.
Invocation{op=com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService',
identityHash=1942368721, partitionId=122, replicaIndex=0, callId=-402091963,
invocationTime=1770108205275 (2026-02-03 16:43:25.275), waitTimeout=-1,
callTimeout=25000,
tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0,
name=engine_ownedSlotProfilesIMap}, tryCount=10, tryPauseMillis=500,
invokeCount=1, callTimeoutMillis=25000, firstInvocationTimeMs=1770108205282,
firstInvocationTime='2026-02-03 16:43:25.282', lastHeartbeatMillis=0,
lastHeartbeatTime='1970-01-01 08:00:00.000', target=[my-seatunnel-master2.xxx
.cn]:5801, pendingResponse={VOID}, backupsAcksExpected=-1,
backupsAcksReceived=0, connection=Connection[id=62,
/xx.xx.xx.xx:5801->/yy.yy.yy.yy:27039, qualifier=null,
endpoint=[my-seatunnel-master2.xxx.cn]:5801,
remoteUuid=48b70b42-3eba-42f5-afe0-abac2cb877bd, alive=true,
connectionType=MEMBER, planeIndex=0]}
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at
org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture.get(CompletableFuture.java:147)
at
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$startTriggerPendingCheckpoint$10(CheckpointCoordinator.java:659)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:656)
at
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
at
org.apache.seatunnel.engine.common.utils.concurrent.CompletableFuture.lambda$new$0(CompletableFuture.java:66)
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:580)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.hazelcast.core.OperationTimeoutException: GetOperation
invocation failed to complete due to operation-heartbeat-timeout. Current time:
2026-02-03 16:44:16.124. Start time: 2026-02-03 16:43:25.282. Total elapsed
time: 50842 ms. Last operation heartbeat: never. Last operation heartbeat from
member: 2026-02-03 16:44:15.671.
Invocation{op=com.hazelcast.map.impl.operation.GetOperation{serviceName='hz:impl:mapService',
identityHash=1942368721, partitionId=122, replicaIndex=0, callId=-402091963,
invocationTime=1770108205275 (2026-02-03 16:43:25.275), waitTimeout=-1,
callTimeout=25000,
tenantControl=com.hazelcast.spi.impl.tenantcontrol.NoopTenantControl@0,
name=engine_ownedSlotProfilesIMap}, tryCount=10, tryPauseMillis=500,
invokeCount=1, callTimeoutMillis=25000, firstInvocationTimeMs=1770108205282,
firstInvocationTime='2026-02-03 16:43:25.282', lastHeartbeatMillis=0,
lastHeartbeatTime='1970-01-01 08:00:00.000',
target=[my-seatunnel-master2.xxx.cn]:5801, pendingResponse={VO
ID}, backupsAcksExpected=-1, backupsAcksReceived=0,
connection=Connection[id=62, /xx.xx.xx.xx:5801->/yy.yy.yy.yy:27039,
qualifier=null, endpoint=[my-seatunnel-master2.xxx.cn]:5801,
remoteUuid=48b70b42-3eba-42f5-afe0-abac2cb877bd, alive=true,
connectionType=MEMBER, planeIndex=0]}
at
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:194)
at
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:136)
at
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:99)
at
com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617)
at
com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:479)
at
com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:371)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:123)
at
org.apache.seatunnel.engine.server.master.JobMaster.queryTaskGroupAddress(JobMaster.java:792)
at
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.sendOperationToMemberNode(CheckpointManager.java:316)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
at
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.triggerCheckpoint(CheckpointCoordinator.java:797)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
at ------ submitted from ------.()
at
com.hazelcast.internal.util.ExceptionUtil.cloneExceptionWithFixedAsyncStackTrace(ExceptionUtil.java:336)
at
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.returnOrThrowWithGetConventions(InvocationFuture.java:112)
at
com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:100)
at
com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:617)
at
com.hazelcast.map.impl.proxy.MapProxySupport.invokeOperation(MapProxySupport.java:479)
at
com.hazelcast.map.impl.proxy.MapProxySupport.getInternal(MapProxySupport.java:371)
at com.hazelcast.map.impl.proxy.MapProxyImpl.get(MapProxyImpl.java:123)
at
org.apache.seatunnel.engine.server.master.JobMaster.queryTaskGroupAddress(JobMaster.java:792)
at
org.apache.seatunnel.engine.server.checkpoint.CheckpointManager.sendOperationToMemberNode(CheckpointManager.java:316)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:545)
at
java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260)
at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:438)
at
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.triggerCheckpoint(CheckpointCoordinator.java:797)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
... 5 more
when i run the arthas command `watch
org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator
scheduleTriggerPendingCheckpoint '{target}'
'target.jobId==2013177567893491713L' -x 2`, it shows as below (`pendingCounter`
is 1)
<img width="2876" height="1578" alt="Image"
src="https://github.com/user-attachments/assets/0aba0f0a-3721-4b45-9e17-50f455c74bed"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]