[
https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sorabh Hamirwasia reassigned DRILL-5721:
----------------------------------------
Assignee: Sorabh Hamirwasia
> Query with only root fragment and no non-root fragment hangs when Drillbit to
> Drillbit Control Connection has network issues
> ----------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-5721
> URL: https://issues.apache.org/jira/browse/DRILL-5721
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Sorabh Hamirwasia
> Assignee: Sorabh Hamirwasia
>
> Recently I found an issue (Thanks to [~knguyen] to create this scenario)
> related to Fragment Status reporting and would like some feedback on it.
> When a client submits a query to Foreman, then it is planned by Foreman and
> later fragments are scheduled to root and non-root nodes. Foreman creates a
> DriilbitStatusListener and FragmentStatusListener to know about the health of
> Drillbit node and a fragment respectively. The way root and non-root
> fragments are setup by Foreman are different:
> Root fragments are setup without any communication over control channel
> (since it is executed locally on Foreman)
> Non-root fragments are setup by sending control message
> (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending
> any such control message (like due to network hiccup's) during query setup
> then the query is failed and client is notified.
> Each fragment is executed on it's node with the help Fragment Executor which
> has an instance for FragmentStatusReporter. FragmentStatusReporter helps to
> update the status of a fragment to Foreman node over a control tunnel or
> connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root
> fragments.
> Based on above when root fragment is submitted for setup then it is done
> locally without any RPC communication whereas when status for that fragment
> is reported by fragment executor that happens over control connection by
> sending a RPC message. But for non-root fragment setup and status update both
> happens using RPC message over control connection.
> *Issue 1:*
> What was observed is if for a simple query which has only 1 root fragment
> running on Foreman node then setup will work fine. But as part of status
> update when the fragment tries to create a control connection and fails to
> establish that, then the query hangs. This is because the root fragment will
> complete execution but will fail to update Foreman about it and Foreman think
> that the query is running for ever.
> *Proposed Solution:*
> For root fragment the setup of fragment is happening locally without RPC
> message, so we can do the same for status update of root fragments. This will
> avoid RPC communication for status update of fragments running locally on the
> foreman and hence will resolve issue 1.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)