[jira] [Commented] (DRILL-4595) FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query hangs

Sudheesh Katkam (JIRA) Fri, 08 Apr 2016 13:48:03 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232891#comment-15232891
 ]


Sudheesh Katkam commented on DRILL-4595:
----------------------------------------

\[copying my comment from dev list thread\].

In any scenario where a thread other than the fragment executor is failing (or 
cancelling) a fragment, that thread should change the state, and then interrupt 
the fragment executor.

> FragmentExecutor.fail() should interrupt the fragment thread to avoid 
> possible query hangs
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4595
>                 URL: https://issues.apache.org/jira/browse/DRILL-4595
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: 1.7.0
>
>
> When a fragment fails it's assumed it will be able to close itself and send 
> it's FAILED state to the foreman which will cancel any running fragments. 
> FragmentExecutor.cancel() will interrupt the thread making sure those 
> fragment don't stay blocked.
> However, if a fragment is already blocked when it's fail method is called the 
> foreman may never be notified about this and the query will hang forever. One 
> such scenario is the following:
> - generally it's a CTAS running on a large cluster (lot's of writers running 
> in parallel)
> - logs show that the user channel was closed and UserServer caused the root 
> fragment to move to a FAILED state
> - jstack shows that the root fragment is blocked in it's receiver waiting for 
> data
> - jstack also shows that ALL other fragments are no longer running, and the 
> logs show that all of them succeeded
> - the foreman waits *forever* for the root fragment to finish



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4595) FragmentExecutor.fail() should interrupt the fragment thread to avoid possible query hangs

Reply via email to