johnny2002 opened a new issue, #17732: URL: https://github.com/apache/dolphinscheduler/issues/17732
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened tasks hanged in submitted status, no more info. here is the master log: [WI-0][TI-0] - 2025-11-26 02:26:44.426 WARN [Curator-TreeCache-0] o.a.d.s.m.c.AbstractClusterSubscribeListener:[45] - Server MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.0012515644555694619, memoryUsage=0.09452163781361474, serverStatus=NORMAL)) removed [WI-0][TI-0] - 2025-11-26 02:26:44.426 WARN [Curator-TreeCache-0] o.a.d.s.m.c.MasterSlotManager:[75] - Do rebalance failed, cannot found the current master: 10.16.10.119:5678 in the normal master clusters: []. Please check the current master server status [WI-0][TI-0] - 2025-11-26 02:26:44.426 INFO [Curator-TreeCache-0] o.a.d.s.m.e.s.SystemEventBus:[40] - Published SystemEvent: MasterFailoverEvent{masterServerMetadata='MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.0012515644555694619, memoryUsage=0.09452163781361474, serverStatus=NORMAL))', eventTime=Wed Nov 26 02:26:44 UTC 2025, delayTime=30000} [WI-0][TI-0] - 2025-11-26 02:26:44.427 WARN [Curator-TreeCache-0] o.a.d.s.m.c.AbstractClusterSubscribeListener:[45] - Server WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0) removed [WI-0][TI-0] - 2025-11-26 02:26:44.427 INFO [Curator-TreeCache-0] o.a.d.s.m.e.s.SystemEventBus:[40] - Published SystemEvent: WorkerFailoverEvent{workerServerMetadata='WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0)', eventTime=Wed Nov 26 02:26:44 UTC 2025, delayTime=30000} [WI-0][TI-0] - 2025-11-26 02:26:44.434 INFO [Curator-TreeCache-0] o.a.d.r.a.h.DefaultServerStatusChangeListener:[32] - The status is standby now. [WI-0][TI-0] - 2025-11-26 02:26:44.434 INFO [Curator-TreeCache-0] o.a.d.s.m.e.TaskGroupCoordinator:[463] - TaskGroupCoordinator closed [WI-0][TI-0] - 2025-11-26 02:26:44.435 ERROR [Thread-20] o.a.d.c.t.ThreadUtils:[80] - Current thread sleep error java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.dolphinscheduler.common.thread.ThreadUtils.sleep(ThreadUtils.java:77) at org.apache.dolphinscheduler.server.master.engine.TaskGroupCoordinator.doStart(TaskGroupCoordinator.java:121) at java.lang.Thread.run(Thread.java:750) [WI-0][TI-0] - 2025-11-26 02:26:44.879 INFO [Curator-TreeCache-0] o.a.d.s.m.c.AbstractClusterSubscribeListener:[41] - Server WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0) added [WI-0][TI-0] - 2025-11-26 02:26:45.260 WARN [MasterCommandLoopThread] o.a.d.s.m.e.c.IdSlotBasedCommandFetcher:[60] - MasterSlotManager check slot (-1 -> 1)is invalidated. [WI-0][TI-0] - 2025-11-26 02:26:45.917 INFO [Curator-TreeCache-0] o.a.d.s.m.c.AbstractClusterSubscribeListener:[41] - Server MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.003740648379052369, memoryUsage=0.09459177738163037, serverStatus=NORMAL)) added [WI-0][TI-0] - 2025-11-26 02:26:45.917 INFO [Curator-TreeCache-0] o.a.d.s.m.c.MasterSlotManager:[89] - Do rebalance success, current master slot: 0, total master slots: 1 [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.f.FailoverCoordinator:[105] - Master[MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.0012515644555694619, memoryUsage=0.09452163781361474, serverStatus=NORMAL))] failover starting [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.f.FailoverCoordinator:[113] - The master[MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.0012515644555694619, memoryUsage=0.09452163781361474, serverStatus=NORMAL))] is alive, maybe it reconnect to registry skip failover [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.e.s.SystemEventBusFireWorker:[103] - Fire SystemEvent: MasterFailoverEvent{masterServerMetadata='MasterServerMetadata(super=BaseServerMetadata(processId=1629687, serverStartupTime=1764123193415, address=10.16.10.119:5678, cpuUsage=0.0012515644555694619, memoryUsage=0.09452163781361474, serverStatus=NORMAL))', eventTime=Wed Nov 26 02:26:44 UTC 2025, delayTime=30000} cost: 0 ms [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.f.FailoverCoordinator:[191] - Worker[WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0)] failover starting [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.f.FailoverCoordinator:[198] - The worker[WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0)] is alive, maybe it reconnect to registry skip failover [WI-0][TI-0] - 2025-11-26 02:27:14.427 INFO [SystemEventBusFireWorker] o.a.d.s.m.e.s.SystemEventBusFireWorker:[103] - Fire SystemEvent: WorkerFailoverEvent{workerServerMetadata='WorkerServerMetadata(workerGroup=default, workerWeight=100.0, taskThreadPoolUsage=0.0)', eventTime=Wed Nov 26 02:26:44 UTC 2025, delayTime=30000} cost: 0 ms [WI-0][TI-0] - 2025-11-26 02:28:06.455 INFO [MasterCommandHandleThreadPool] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: WorkflowStartLifecycleEvent{workflow=03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126} [WI-0][TI-0] - 2025-11-26 02:28:06.456 INFO [MasterCommandHandleThreadPool] o.a.d.s.m.e.c.CommandEngine:[174] - Success bootstrap command { "id" : 8928, "commandType" : "START_PROCESS", "workflowDefinitionCode" : 18819871298950, "workflowDefinitionVersion" : 19, "workflowInstanceId" : 14481, "commandParam" : "{\"commandType\":\"START_PROCESS\",\"subWorkflowInstance\":false,\"startNodes\":[],\"commandParams\":[{\"prop\":\"bizDate\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"$[yyyy-MM-dd-1]\"},{\"prop\":\"tableName\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"fr_project_code_dealer\"},{\"prop\":\"srcSystem\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"yecai\"},{\"prop\":\"DMP_DB\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"cust\"},{\"prop\":\"SRC_DB\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"loan\"},{\"prop\":\"slctColums\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"t.project_code,t.dealer_name,t.create_time,t.yewuyuan\"},{\"prop\":\"dof\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"dim\"},{\"prop\":\"tableIdCol\",\"direct\":\"IN\",\"type\":\"VARCHAR\",\"value\":\"project_code\"}],\"timeZone\":\"UTC\"}", "workflowInstancePriority" : "MEDIUM", "executorId" : 0, "taskDependType" : "TASK_POST", "failureStrategy" : "CONTINUE", "warningType" : "NONE", "warningGroupId" : null, "scheduleTime" : null, "startTime" : null, "updateTime" : "2025-11-26 02:28:06", "workerGroup" : null, "tenantCode" : "default", "environmentCode" : -1, "dryRun" : 0 } [WI-14481][TI-0] - 2025-11-26 02:28:06.471 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.w.l.h.AbstractWorkflowLifecycleEventHandler:[47] - Begin fire workflow 03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126 LifecycleEvent[WorkflowStartLifecycleEvent{workflow=03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126}] with state: RUNNING_EXECUTION [WI-14481][TI-0] - 2025-11-26 02:28:06.471 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskStartLifecycleEvent{task=mysql->hdfs} [WI-14481][TI-0] - 2025-11-26 02:28:06.471 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.w.l.h.AbstractWorkflowLifecycleEventHandler:[52] - Fired workflow 03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126 LifecycleEvent[WorkflowStartLifecycleEvent{workflow=03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126}] with state: RUNNING_EXECUTION [WI-14481][TI-0] - 2025-11-26 02:28:06.482 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskDispatchLifecycleEvent{task=mysql->hdfs} [WI-14481][TI-0] - 2025-11-26 02:28:06.482 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskStartLifecycleEvent{task=mysql->hdfs} with state SUBMITTED_SUCCESS [WI-14481][TI-0] - 2025-11-26 02:28:06.486 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.d.WorkerGroupDispatcher:[56] - Initialize WorkerGroupDispatcher: WorkerGroupTaskDispatcher-default [WI-14481][TI-0] - 2025-11-26 02:28:06.486 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.d.WorkerGroupDispatcher:[62] - The WorkerGroupTaskDispatcher-default starting... [WI-14481][TI-0] - 2025-11-26 02:28:06.486 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.d.WorkerGroupDispatcher:[64] - The WorkerGroupTaskDispatcher-default started [WI-14481][TI-0] - 2025-11-26 02:28:06.486 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.d.WorkerGroupDispatcherCoordinator:[59] - Success add Task[id=55958] to WorkerGroupDispatcher[name=default] [WI-14481][TI-0] - 2025-11-26 02:28:06.486 INFO [ds-workflow-eventbus-worker-3] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskDispatchLifecycleEvent{task=mysql->hdfs} with state SUBMITTED_SUCCESS [WI-14481][TI-55958] - 2025-11-26 02:28:06.522 INFO [WorkerGroupTaskDispatcher-default] o.a.d.e.b.c.JdkDynamicRpcClientProxyFactory:[56] - Create DynamicRpcClientProxy cache for host: 10.16.10.117:1234 [WI-0][TI-0] - 2025-11-26 02:28:06.576 INFO [MasterRpcServer-methodInvoker-1] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskDispatchedLifecycleEvent{task=mysql->hdfs, executorHost='10.16.10.117:1234'} [WI-14481][TI-0] - 2025-11-26 02:28:06.599 INFO [ds-workflow-eventbus-worker-2] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskDispatchedLifecycleEvent{task=mysql->hdfs, executorHost='10.16.10.117:1234'} with state SUBMITTED_SUCCESS [WI-0][TI-0] - 2025-11-26 02:28:06.622 INFO [MasterRpcServer-methodInvoker-2] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskRunningLifecycleEvent{task=mysql->hdfs, logPath='/opt/datasophon/dolphinscheduler-3.3.2/worker-server/logs/20251126/18819871298950/19/14481/55958.log', startTime=Wed Nov 26 02:28:06 UTC 2025} [WI-14481][TI-0] - 2025-11-26 02:28:06.724 INFO [ds-workflow-eventbus-worker-16] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskRunningLifecycleEvent{task=mysql->hdfs, logPath='/opt/datasophon/dolphinscheduler-3.3.2/worker-server/logs/20251126/18819871298950/19/14481/55958.log', startTime=Wed Nov 26 02:28:06 UTC 2025} with state DISPATCH [WI-0][TI-0] - 2025-11-26 02:28:06.727 INFO [MasterRpcServer-methodInvoker-3] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskRunningLifecycleEvent{task=mysql->hdfs, runtimeContext=null} [WI-14481][TI-0] - 2025-11-26 02:28:06.830 INFO [ds-workflow-eventbus-worker-14] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskRunningLifecycleEvent{task=mysql->hdfs, runtimeContext=null} with state RUNNING_EXECUTION [WI-0][TI-0] - 2025-11-26 02:28:07.574 INFO [MasterRpcServer-methodInvoker-4] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskSuccessLifecycleEvent{task=mysql->hdfs, endTime=Wed Nov 26 02:28:07 UTC 2025, varPool='[Property(prop=fLines, direct=OUT, type=VARCHAR, value=${sCount}), Property(prop=newLineColNums, direct=OUT, type=VARCHAR, value=)]'} [WI-14481][TI-0] - 2025-11-26 02:28:07.641 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: WorkflowTopologyLogicalTransitionWithTaskFinishLifecycleEvent{task=mysql->hdfstaskState=SUCCESS} [WI-14481][TI-0] - 2025-11-26 02:28:07.644 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task mysql->hdfs TaskSuccessLifecycleEvent{task=mysql->hdfs, endTime=Wed Nov 26 02:28:07 UTC 2025, varPool='[Property(prop=fLines, direct=OUT, type=VARCHAR, value=${sCount}), Property(prop=newLineColNums, direct=OUT, type=VARCHAR, value=)]'} with state RUNNING_EXECUTION [WI-14481][TI-0] - 2025-11-26 02:28:07.644 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.w.l.h.AbstractWorkflowLifecycleEventHandler:[47] - Begin fire workflow 03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126 LifecycleEvent[WorkflowTopologyLogicalTransitionWithTaskFinishLifecycleEvent{task=mysql->hdfstaskState=SUCCESS}] with state: RUNNING_EXECUTION [WI-14481][TI-0] - 2025-11-26 02:28:07.644 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.WorkflowEventBus:[41] - Publish event: TaskStartLifecycleEvent{task=加载到临时表hive} [WI-14481][TI-0] - 2025-11-26 02:28:07.644 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.w.l.h.AbstractWorkflowLifecycleEventHandler:[52] - Fired workflow 03.CUST_加载(dim)_fr_project_code_dealer-v2-20251126022806126 LifecycleEvent[WorkflowTopologyLogicalTransitionWithTaskFinishLifecycleEvent{task=mysql->hdfstaskState=SUCCESS}] with state: RUNNING_EXECUTION [WI-14481][TI-0] - 2025-11-26 02:28:07.653 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.TaskGroupCoordinator:[363] - Success insert TaskGroupQueue: TaskGroupQueue(id=null, taskId=55959, taskName=加载到临时表hive, projectName=null, projectCode=null, workflowInstanceName=null, groupId=1, workflowInstanceId=14481, priority=0, forceStart=0, inQueue=1, status=WAIT_QUEUE, createTime=Wed Nov 26 02:28:07 UTC 2025, updateTime=Wed Nov 26 02:28:07 UTC 2025) for TaskInstance: 加载到临时表hive [WI-14481][TI-0] - 2025-11-26 02:28:07.662 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.t.s.AbstractTaskStateAction:[238] - Task[name=加载到临时表hive] using taskGroup, success acquire taskGroup slot [WI-14481][TI-0] - 2025-11-26 02:28:07.662 INFO [ds-workflow-eventbus-worker-15] o.a.d.s.m.e.t.l.h.AbstractTaskLifecycleEventHandler:[47] - Fired task 加载到临时表hive TaskStartLifecycleEvent{task=加载到临时表hive} with state SUBMITTED_SUCCESS <img width="774" height="671" alt="Image" src="https://github.com/user-attachments/assets/b816a660-e18f-475d-81c8-1a71cd25cc03" /> <img width="714" height="677" alt="Image" src="https://github.com/user-attachments/assets/50de7b72-e6bd-40bf-af0b-d89f7c40b107" /> ### What you expected to happen workflow go ahead ### How to reproduce start a workflow ### Anything else 经常卡死在不同的节点。也无法停止 ### Version dev ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
