reele commented on issue #17342:
URL:
https://github.com/apache/dolphinscheduler/issues/17342#issuecomment-3086365872
> We do update the workflow's host, there might exist concurrent issue, it's
better to get the workflow's host from executor, once the event ready to
report, then get the host from executor.
>
> ```
> public boolean reassignWorkflowInstanceHost(final
TaskExecutorReassignMasterRequest taskExecutorReassignMasterRequest) {
> final int taskInstanceId =
taskExecutorReassignMasterRequest.getTaskInstanceId();
> final String workflowHost =
taskExecutorReassignMasterRequest.getWorkflowHost();
> // todo: Is this reassign can make sure there is no concurrent
problem?
> physicalTaskExecutorRepository.get(taskInstanceId).ifPresent(
> taskExecutor ->
taskExecutor.getTaskExecutionContext().setWorkflowInstanceHost(workflowHost));
> return
physicalTaskExecutorEventReporter.reassignWorkflowInstanceHost(taskInstanceId,
workflowHost);
> }
> ```
Oh i found why! it's caused by this issue, there are the other logs:
```
[WI-0][TI-0] - 2025-07-10 20:30:54.101 ERROR [MasterCommandHandleThreadPool]
o.a.d.s.m.e.c.CommandEngine:[186] - Failed bootstrap command {
"id" : 4889016,
"commandType" : "RECOVER_TOLERANCE_FAULT_PROCESS",
"workflowDefinitionCode" : 15081302155680,
"workflowDefinitionVersion" : 20,
"workflowInstanceId" : 4828292,
"commandParam" :
"{\"commandType\":\"RECOVER_TOLERANCE_FAULT\",\"subWorkflowInstance\":false,\"startNodes\":null,\"commandParams\":null,\"timeZone\":null,\"workflowExecutionStatus\":\"RUNNING_EXECUTION\"}",
"workflowInstancePriority" : "MEDIUM",
"executorId" : 0,
"taskDependType" : "TASK_POST",
"failureStrategy" : "CONTINUE",
"warningType" : "NONE",
"warningGroupId" : null,
"scheduleTime" : null,
"startTime" : null,
"updateTime" : "2025-07-10 20:30:53",
"workerGroup" : null,
"tenantCode" : "default",
"environmentCode" : -1,
"dryRun" : 0
}
java.util.concurrent.CompletionException: java.lang.IllegalStateException:
WorkflowExecuteRunnable(4828292/WORKFLOW-A-20250710194500099 already registered
at WorkflowEventBusFireWorker
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:673)
at
java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException:
WorkflowExecuteRunnable(4828292/WORKFLOW-A-20250710194500099 already registered
at WorkflowEventBusFireWorker
at
com.google.common.base.Preconditions.checkState(Preconditions.java:821)
at
org.apache.dolphinscheduler.server.master.engine.WorkflowEventBusFireWorker.registerWorkflowEventBus(WorkflowEventBusFireWorker.java:63)
at
org.apache.dolphinscheduler.server.master.engine.WorkflowEventBusCoordinator.registerWorkflowEventBus(WorkflowEventBusCoordinator.java:50)
at
org.apache.dolphinscheduler.server.master.engine.command.CommandEngine.bootstrapWorkflowExecutionRunnable(CommandEngine.java:167)
at
java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670)
... 6 common frames omitted
```
after the master-2.1.21 started, it published the failover command too,
coincidentally, master 2.1.20 captured this command, after called
`bootstrapCommand` in `CommandEngine`, it failed on
`bootstrapWorkflowExecutionRunnable`, so the task executor is already
reassigned to master 2.1.20 again, and the new `workflowExecutionRunnable` is
already put into `workflowRepository`, but failed in
`workflowEventBusCoordinator.registerWorkflowEventBus`, so there is no thread
to handle the new `workflowExecutionRunnable`'s event bus, so all the events
got stuck in the executor's channel.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]