crazychengmm opened a new issue, #17884: URL: https://github.com/apache/dolphinscheduler/issues/17884
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened Describe the bug In a multi-master environment (DolphinScheduler 3.2.0), when a workflow containing a SubProcess task is executed, the Master logs report a NullPointerException in TaskStateEventHandler. The TaskStateEvent is broken because the taskCode is 0 and the status is null. This prevents the parent workflow from progressing, and the Master falls into an infinite retry loop for this event. Reproducibility 100% Reproducible: This issue happens every time we run a workflow with a SubProcess in a multi-master setup. Single Master Test: When we scale down to a Single Master node, the issue disappears completely, and the same workflow finishes successfully. This confirms it is a synchronization or metadata visibility issue specific to the Multi-Master architecture. Environment: DolphinScheduler Version: 3.2.0 OS: Linux Java Version: Java version "1.8.0_202" Database: MySQL Deployment Mode: Cluster (Multiple Master Servers) Log Snippet: text [INFO] 2026-01-15 17:11:38.720 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[292] - [WorkflowInstance-8249][TaskInstance-2445040] - Begin to handle state event, TaskStateEvent(processInstanceId=8249, taskInstanceId=2445040, taskCode=0, status=null, type=TASK_STATE_CHANGE, key=8250-0-8249-2445040, channel=null, context=null) [WARN] 2026-01-15 17:11:38.720 +0800 org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler:[96] - [WorkflowInstance-8249][TaskInstance-2445040] - The task event is broken..., taskEvent: TaskStateEvent(processInstanceId=8249, taskInstanceId=2445040, taskCode=0, status=null, type=TASK_STATE_CHANGE, key=8250-0-8249-2445040, channel=null, context=null) [ERROR] 2026-01-15 17:11:38.720 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[317] - [WorkflowInstance-8249][TaskInstance-2445040] - State event handle error, get a unknown exception, will retry this event: TaskStateEvent(processInstanceId=8249, taskInstanceId=2445040, taskCode=0, status=null, type=TASK_STATE_CHANGE, key=8250-0-8249-2445040, channel=null, context=null) java.lang.NullPointerException: null at org.apache.dolphinscheduler.server.master.event.TaskStateEventHandler.handleStateEvent(TaskStateEventHandler.java:56) at org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable.handleEvents(WorkflowExecuteRunnable.java:293) ... Steps to Reproduce: Deploy DolphinScheduler 3.2.0 with 2 or more Master nodes. Create a SubProcess workflow. Create a Parent workflow containing a SubProcess node. Run the Parent workflow. Once the SubProcess completes, the NPE will be triggered in the Master managing the Parent workflow. Questions: Is this a known issue in the 3.2.0 release related to event distribution between Masters? If 3.2.0 is no longer the recommended stable version, could you please advise which version (e.g., 3.2.1, 3.2.2, or 3.3.x) contains the fix for this specific SubProcess callback issue? Expected Behavior: In a Multi-Master environment, the Master node should be able to correctly reconstruct the TaskStateEvent with the valid taskCode and status when a SubProcess completes. ### What you expected to happen 1. Workaround: Since we are stuck on 3.2.0, is there any configuration change or workflow design adjustment (e.g., using different node types) that can bypass this NPE in a multi-master setup? 2. Fixed Version: Which specific version (3.2.1, 3.2.2, or 3.3.x) officially fixes this taskCode=0 and status=null issue for SubProcess callbacks? I will use that version to verify the fix in our UAT environment. ### How to reproduce 1. Deploy DolphinScheduler 3.2.0 with 3 Master nodes (HA). 2. Use MySQL as the database (JDK 1.8.0_202). 3. Create a workflow with a SubProcess node (pointing to a valid child workflow). 4. Run the parent workflow. 5. Observe Master logs: the issue happens every time in our setup. ### Anything else _No response_ ### Version dev ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
