[ https://issues.apache.org/jira/browse/TEZ-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361609#comment-16361609 ]
Jason Lowe commented on TEZ-3893: --------------------------------- Thanks for the patch! A lot of the fragility in this code stems from the fact that there are items in the queue that we can process and items we cannot, and we're trying to juggle them in the same queue. I'm wondering if this gets a lot cleaner if it is refactored into two parts, a front-end dispatcher/handler and a fixed-size thread pool executor to do the executions. The front-end _always_ pulls from the queue (just FIFO, not priority). If the message is an allocate, the dispatcher schedules the task with the fixed thread pool executor and tracks the Future from that schedule in a map. If the message is a deallocate then it looks up the Future from the map and cancels it, which will prevent it from executing if it hasn't or should interrupt the thread that is currently executing the task. After that refactoring then the queue management becomes very simple. The dispatcher takes from the queue, always processes the message, then is ready to take from the queue again. The fixed thread pool executor takes a task, executes it, then is ready to take the next task if any. > Tez Local Mode can hang for cases > --------------------------------- > > Key: TEZ-3893 > URL: https://issues.apache.org/jira/browse/TEZ-3893 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Priority: Major > Attachments: TEZ-3893.002.patch, TEZ-3893.1.patch > > > The scheduler has a race condition where events that notify can be added > while the blocking queue is not waiting, but just before waiting. In this > case, we can wait forever. -- This message was sent by Atlassian JIRA (v7.6.3#76005)