dahai1996 opened a new pull request, #12584: URL: https://github.com/apache/dolphinscheduler/pull/12584
## Purpose of the pull request when using task group for jobs,we get a bug. here is the log: ``` [INFO] 2022-10-25 09:00:00.140 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - [WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, taskInstanceId: 28719, t askGroupId: 26497 [INFO] 2022-10-25 09:00:00.140 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] - [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, StateEvent( key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, channel=null) [INFO] 2022-10-25 09:00:00.159 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - [WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, taskInstanceId: 28719, t askGroupId: 26497 [INFO] 2022-10-25 09:00:00.159 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] - [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, StateEvent( key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, channel=null) [INFO] 2022-10-25 09:00:00.178 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - [WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, taskInstanceId: 28719, t askGroupId: 26497 [INFO] 2022-10-25 09:00:00.179 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] - [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, StateEvent( key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, channel=null) ``` the log keeps recurring. this makes the task group get the task instance status error. I found out that it was caused by duplicate messages,so add code to check for duplicates ## Verify this pull request This bug is hard to reproduce: it only happens when we repeatedly receive the message about "rob taskGroup". (maybe the duplicate msg is a bug?) and after few days,I got the logs in the actual run : ``` [INFO] 2022-10-28 04:02:51.179 +0800 org.apache.dolphinscheduler.server.master.processor.TaskEventProcessor:[64] - [WorkflowInstance-852][TaskInstance-32214] - Received task event change command, event: StateEvent(key=852-32214, type=WAIT_TASK_GROUP, executionStatus=null, taskInstanceId=32214, taskCode=0, processInstanceId=852, context=null, channel=null) [INFO] 2022-10-28 04:02:51.179 +0800 org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[98] - [WorkflowInstance-852][TaskInstance-32214] - Submit state event success, stateEvent: StateEvent(key=852-32214, type=WAIT_TASK_GROUP, executionStatus=null, taskInstanceId=32214, taskCode=0, processInstanceId=852, context=null, channel=null) [INFO] 2022-10-28 04:02:51.197 +0800 org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2942] - [WorkflowInstance-854][TaskInstance-32211] - This is a duplicate message,will not rob taskGroup, taskInstanceId: 32211, taskGroupId: 29988 [INFO] 2022-10-28 04:02:51.198 +0800 TaskLogLogger-class org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[94] - [WorkflowInstance-854][TaskInstance-32211] - task ready to dispatch to worker: taskInstanceId: 32211 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
