dahai1996 opened a new pull request, #12584:
URL: https://github.com/apache/dolphinscheduler/pull/12584

   
   ## Purpose of the pull request
   when using task group for jobs,we get a bug. here is the log:
   ```
   [INFO] 2022-10-25 09:00:00.140 +0800 
org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - 
[WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, 
taskInstanceId: 28719, t
   askGroupId: 26497
   [INFO] 2022-10-25 09:00:00.140 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] 
- [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, 
StateEvent(
   key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, 
taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, 
channel=null)
   [INFO] 2022-10-25 09:00:00.159 +0800 
org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - 
[WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, 
taskInstanceId: 28719, t
   askGroupId: 26497
   [INFO] 2022-10-25 09:00:00.159 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] 
- [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, 
StateEvent(
   key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, 
taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, 
channel=null)
   [INFO] 2022-10-25 09:00:00.178 +0800 
org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2929] - 
[WorkflowInstance-798][TaskInstance-28719] - Failed to rob taskGroup, 
taskInstanceId: 28719, t
   askGroupId: 26497
   [INFO] 2022-10-25 09:00:00.179 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteRunnable:[269] 
- [WorkflowInstance-798][TaskInstance-28719] - Begin to handle state event, 
StateEvent(
   key=798-28719, type=WAIT_TASK_GROUP, executionStatus=null, 
taskInstanceId=28719, taskCode=0, processInstanceId=798, context=null, 
channel=null)
   ```
   the log keeps recurring.
   this makes the task group get the task instance status error.
   I found out that it was caused by duplicate messages,so add code to check 
for duplicates
   
   ## Verify this pull request
   This bug is hard to reproduce:  it only happens when we repeatedly receive 
the message about "rob taskGroup". (maybe the 
   duplicate msg is a bug?)
   and after few days,I got the logs in the actual run :
   ```
   [INFO] 2022-10-28 04:02:51.179 +0800 
org.apache.dolphinscheduler.server.master.processor.TaskEventProcessor:[64] - 
[WorkflowInstance-852][TaskInstance-32214] - Received task event change 
command, event: StateEvent(key=852-32214, type=WAIT_TASK_GROUP, 
executionStatus=null, taskInstanceId=32214, taskCode=0, processInstanceId=852, 
context=null, channel=null)
   [INFO] 2022-10-28 04:02:51.179 +0800 
org.apache.dolphinscheduler.server.master.runner.WorkflowExecuteThreadPool:[98] 
- [WorkflowInstance-852][TaskInstance-32214] - Submit state event success, 
stateEvent: StateEvent(key=852-32214, type=WAIT_TASK_GROUP, 
executionStatus=null, taskInstanceId=32214, taskCode=0, processInstanceId=852, 
context=null, channel=null)
   [INFO] 2022-10-28 04:02:51.197 +0800 
org.apache.dolphinscheduler.service.process.ProcessServiceImpl:[2942] - 
[WorkflowInstance-854][TaskInstance-32211] - This is a duplicate message,will 
not rob taskGroup, taskInstanceId: 32211, taskGroupId: 29988
   [INFO] 2022-10-28 04:02:51.198 +0800 TaskLogLogger-class 
org.apache.dolphinscheduler.server.master.runner.task.CommonTaskProcessor:[94] 
- [WorkflowInstance-854][TaskInstance-32211] - task ready to dispatch to 
worker: taskInstanceId: 32211
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to