2421994771 commented on issue #14813:
URL:
https://github.com/apache/dolphinscheduler/issues/14813#issuecomment-1920716898
When a high priority task in the wake-up stage, IN_QUEUE is set to 1 after
any exception, resulting in IN_QUEUE is not set to 0, then will lead to
subsequent all tasks with a lower priority than it can not be consumed, even if
the queue is empty, at present I here is due to the IN_QUEUE is set to 1, send
WAKE_UP message to the master when the communication error caused by this
phenomenon.
1. Due to the communication error will lead to task wake-up failure,
failure will never be reawakened
2. The wake-up failure will lead to the existence of status=-1, in_queue =1
records in the t_ds_task_group_queue, the record will lead to all tasks with a
lower priority than the record priority can not access the queue resources
Temporary solution:
through the SQL query out of the t_ds_task_group_queue table associated
with the workflow being executed 'status = -1 and in_queue = 1 records and
update the in_queue value of these records to 0 so that they can be consumed
again
英文不不太好,附上中文:
当一个高优先级的任务在唤醒阶段,IN_QUEUE置为1之后发生任何异常,导致IN_QUEUE没有被置为0,那么将导致后续所有优先级比它小的任务都无法被消费,即使队列是空的。
目前我这边是由于在IN_QUEUE置为1后,发送WAKE_UP消息给master时通信异常导致的这个现象。
目前任务组功能发现两个bug:
1.由于通信异常会导致任务唤醒失败,失败后永远不会被再次唤醒
2.唤醒失败导致t_ds_task_group_queue中存在 status=-1, in_queue =1
的记录,该记录将导致所有优先级小于此记录优先级的任务无法获取队列资源
临时解决办法:
通过SQL查询出与正在执行中的工作流关联的t_ds_task_group_queue表中 'status = -1 and in_queue = 1
的记录 并将这些记录in_queue值更新为0 使他们能够再次被消费
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]