[ 
https://issues.apache.org/jira/browse/DRILL-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim updated DRILL-2582:
------------------------------------
    Description: 
We're having trouble always reporting cascading failures that result from a 
failure or cancellation, and this turns out to be because QueryManager is 
indiscriminately manipulating Foreman's state without paying any attention to 
its current state.

For example, suppose we request a cancellation of a query, and Foreman issues 
queryManager.cancelExecutingFragments. However, in the meantime, suppose a 
fragment failed. The fragment failure will be picked up by 
QueryManager.statusUpdate(), which then uses stateListener to slam Foreman to 
the FAILED state. However, Foreman was in CANCELLATION_REQUESTED, and is 
waiting for the cancellation acknowledgements. The sudden move to FAILED shuts 
it down. The Foreman will still send out a CANCELED terminal state, but won't 
report the failure or any cascading failure from the cancellations.

What should happen is that QueryManager should instead report on fragment 
status updates to Foreman, and Foreman should decide what transition to make 
based on the fragment status update and it's own current state. In the above, a 
fragment failure notification after we're already in CANCELLATION_REQUESTED 
shouldn't result in any state transition at all, but should simply attach the 
fragment failure to any current suppressed deferred exceptions. This means 
QueryManager.statusUpdate() and QueryManager.fragmentDone() need to be 
reworked, and Foreman needs to give QueryManager a listener for reporting 
fragment status changes, rather than allowing it to directly manipulate the 
Foreman's state.

  was:
We're having trouble always reporting cascading failures that result from a 
failure or cancellation, and this turns out to be because QueryManager is 
indiscriminately manipulating Foreman's state without paying any attention to 
its current state.

For example, suppose we request a cancellation of a query, and Foreman issues 
queryManager.cancelExecutingFragments. However, in the meantime, suppose a 
fragment failed. The fragment failure will be picked up by 
QueryManager.statusUpdate(), which then uses stateListener to slam Foreman to 
the FAILED state. However, Foreman was in CANCELLATION_REQUESTED, and is 
waiting for the cancellation acknowledgements. The sudden move to FAILED shuts 
it down and sends out a FAILURE message instead of the expected CANCELED 
terminal state, and won't report on any cascading failure from the 
cancellations.

What should happen is that QueryManager should instead report on fragment 
status updates to Foreman, and Foreman should decide what transition to make 
based on the fragment status update and it's own current state. In the above, a 
fragment failure notification after we're already in CANCELLATION_REQUESTED 
shouldn't result in any state transition at all, but should simply attach the 
fragment failure to any current suppressed deferred exceptions. This means 
QueryManager.statusUpdate() and QueryManager.fragmentDone() need to be 
reworked, and Foreman needs to give QueryManager a listener for reporting 
fragment status changes, rather than allowing it to directly manipulate the 
Foreman's state.


> QueryManager shouldn't be manipulating Foreman's state directly
> ---------------------------------------------------------------
>
>                 Key: DRILL-2582
>                 URL: https://issues.apache.org/jira/browse/DRILL-2582
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.8.0
>            Reporter: Chris Westin
>            Assignee: Deneche A. Hakim
>             Fix For: 0.9.0
>
>
> We're having trouble always reporting cascading failures that result from a 
> failure or cancellation, and this turns out to be because QueryManager is 
> indiscriminately manipulating Foreman's state without paying any attention to 
> its current state.
> For example, suppose we request a cancellation of a query, and Foreman issues 
> queryManager.cancelExecutingFragments. However, in the meantime, suppose a 
> fragment failed. The fragment failure will be picked up by 
> QueryManager.statusUpdate(), which then uses stateListener to slam Foreman to 
> the FAILED state. However, Foreman was in CANCELLATION_REQUESTED, and is 
> waiting for the cancellation acknowledgements. The sudden move to FAILED 
> shuts it down. The Foreman will still send out a CANCELED terminal state, but 
> won't report the failure or any cascading failure from the cancellations.
> What should happen is that QueryManager should instead report on fragment 
> status updates to Foreman, and Foreman should decide what transition to make 
> based on the fragment status update and it's own current state. In the above, 
> a fragment failure notification after we're already in CANCELLATION_REQUESTED 
> shouldn't result in any state transition at all, but should simply attach the 
> fragment failure to any current suppressed deferred exceptions. This means 
> QueryManager.statusUpdate() and QueryManager.fragmentDone() need to be 
> reworked, and Foreman needs to give QueryManager a listener for reporting 
> fragment status changes, rather than allowing it to directly manipulate the 
> Foreman's state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to