Chris Westin created DRILL-2582:
-----------------------------------
Summary: QueryManager shouldn't be manipulating Foreman's state
directly
Key: DRILL-2582
URL: https://issues.apache.org/jira/browse/DRILL-2582
Project: Apache Drill
Issue Type: Bug
Components: Execution - Flow
Affects Versions: 0.8.0
Reporter: Chris Westin
Assignee: Deneche A. Hakim
Fix For: 0.9.0
We're having trouble always reporting cascading failures that result from a
failure or cancellation, and this turns out to be because QueryManager is
indiscriminately manipulating Foreman's state without paying any attention to
its current state.
For example, suppose we request a cancellation of a query, and Foreman issues
queryManager.cancelExecutingFragments. However, in the meantime, suppose a
fragment failed. The fragment failure will be picked up by
QueryManager.statusUpdate(), which then uses stateListener to slam Foreman to
the FAILED state. However, Foreman was in CANCELLATION_REQUESTED, and is
waiting for the cancellation acknowledgements. The sudden move to FAILED shuts
it down and sends out a FAILURE message instead of the expected CANCELED
terminal state, and won't report on any cascading failure from the
cancellations.
What should happen is that QueryManager should instead report on fragment
status updates to Foreman, and Foreman should decide what transition to make
based on the fragment status update and it's own current state. In the above, a
fragment failure notification after we're already in CANCELLATION_REQUESTED
shouldn't result in any state transition at all, but should simply attach the
fragment failure to any current suppressed deferred exceptions. This means
QueryManager.statusUpdate() and QueryManager.fragmentDone() need to be
reworked, and Foreman needs to give QueryManager a listener for reporting
fragment status changes, rather than allowing it to directly manipulate the
Foreman's state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)