ruanwenjun commented on PR #9955:
URL: 
https://github.com/apache/dolphinscheduler/pull/9955#issuecomment-1173316182

   > > If you want to reduce the dispatchFailed log, I have fixed the sleep 
logic in #10631. If there is no worker, the master will not dead loop now.
   > 
   > > In fact, I don't think we need to use a dispatch failed queue and set 
the maxDispatchSize for a task.
   > > Dispatch failed due to the worker network error, this should have 
nothing to do with the task. So the correct thing is to find the dispatch 
failed worker, and `separate` it rather than separate the task.
   > > And I remember in the current design when a task dispatch failed, it 
will go back to the task queue, and retry-retry, I think this is reasonable.
   > > One possibly thing I think we may need to do is separate the task state 
`Dispatch` to `Dispatching` and `Dispatched`.
   > 
   > thank you @ruanwenjun , the Pending state of the workflow instance and the 
dispatch failure queue are added to solve three problems:
   > 
   > 1. The status of dispatch failure is displayed to the user, no longer 
depends on the observation log or is unclear about the real status of the 
current task.
   This is a good point, we set the status dispatching, dispatched. BTY, this 
case may occur in all our status. 
   
   > 2. The use of the dispatch failure queue is to avoid the high-priority 
header occupancy that occurred when the task queue was put back before.
   
   When we dispatch failure, the reason is that there is no worker/worker's 
network is broken, the correct thing is we don't consume command, although use 
failure queue can mitigating such problem, but this is the same solution to the 
current plan, put it back to the normal queue, and sleep.
   
   When you put it to failure queue, you need to use another thread to handle 
it, otherwise, you may influence the normal process.
   https://github.com/apache/dolphinscheduler/pull/9955#discussion_r912302035
   
   > 3. Invalid log printing that keeps looping after worker dispatch fails.
   This problem is fixed by sleep.
   
   > 
   > based on these three points, I think your question can be explained and is 
consistent with the present.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to