ruanwenjun commented on PR #9955: URL: https://github.com/apache/dolphinscheduler/pull/9955#issuecomment-1173316182
> > If you want to reduce the dispatchFailed log, I have fixed the sleep logic in #10631. If there is no worker, the master will not dead loop now. > > > In fact, I don't think we need to use a dispatch failed queue and set the maxDispatchSize for a task. > > Dispatch failed due to the worker network error, this should have nothing to do with the task. So the correct thing is to find the dispatch failed worker, and `separate` it rather than separate the task. > > And I remember in the current design when a task dispatch failed, it will go back to the task queue, and retry-retry, I think this is reasonable. > > One possibly thing I think we may need to do is separate the task state `Dispatch` to `Dispatching` and `Dispatched`. > > thank you @ruanwenjun , the Pending state of the workflow instance and the dispatch failure queue are added to solve three problems: > > 1. The status of dispatch failure is displayed to the user, no longer depends on the observation log or is unclear about the real status of the current task. This is a good point, we set the status dispatching, dispatched. BTY, this case may occur in all our status. > 2. The use of the dispatch failure queue is to avoid the high-priority header occupancy that occurred when the task queue was put back before. When we dispatch failure, the reason is that there is no worker/worker's network is broken, the correct thing is we don't consume command, although use failure queue can mitigating such problem, but this is the same solution to the current plan, put it back to the normal queue, and sleep. When you put it to failure queue, you need to use another thread to handle it, otherwise, you may influence the normal process. https://github.com/apache/dolphinscheduler/pull/9955#discussion_r912302035 > 3. Invalid log printing that keeps looping after worker dispatch fails. This problem is fixed by sleep. > > based on these three points, I think your question can be explained and is consistent with the present. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
