Jason Lowe commented on TEZ-3893:

Thanks for the patch!

A lot of the fragility in this code stems from the fact that there are items in 
the queue that we can process and items we cannot, and we're trying to juggle 
them in the same queue.  I'm wondering if this gets a lot cleaner if it is 
refactored into two parts, a front-end dispatcher/handler and a fixed-size 
thread pool executor to do the executions.  The front-end _always_ pulls from 
the queue (just FIFO, not priority).  If the message is an allocate, the 
dispatcher schedules the task with the fixed thread pool executor and tracks 
the Future from that schedule in a map.  If the message is a deallocate then it 
looks up the Future from the map and cancels it, which will prevent it from 
executing if it hasn't or should interrupt the thread that is currently 
executing the task.

After that refactoring then the queue management becomes very simple.  The 
dispatcher takes from the queue, always processes the message, then is ready to 
take from the queue again.  The fixed thread pool executor takes a task, 
executes it, then is ready to take the next task if any.

> Tez Local Mode can hang for cases
> ---------------------------------
>                 Key: TEZ-3893
>                 URL: https://issues.apache.org/jira/browse/TEZ-3893
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3893.002.patch, TEZ-3893.1.patch
> The scheduler has a race condition where events that notify can be added 
> while the blocking queue is not waiting, but just before waiting. In this 
> case, we can wait forever.

This message was sent by Atlassian JIRA

Reply via email to