[ 
https://issues.apache.org/jira/browse/DRILL-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137158#comment-16137158
 ] 

Sorabh Hamirwasia commented on DRILL-5721:
------------------------------------------

[~parthc] - Please help to review this PR. For some reason the JIRA is not 
picking up the PR automatically should I close and re-open the PR ?

> Query with only root fragment and no non-root fragment hangs when Drillbit to 
> Drillbit Control Connection has network issues
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5721
>                 URL: https://issues.apache.org/jira/browse/DRILL-5721
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Sorabh Hamirwasia
>            Assignee: Sorabh Hamirwasia
>             Fix For: 1.12.0
>
>
> Recently I found an issue (Thanks to [~knguyen] to create this scenario) 
> related to Fragment Status reporting and would like some feedback on it. 
> When a client submits a query to Foreman, then it is planned by Foreman and 
> later fragments are scheduled to root and non-root nodes. Foreman creates a 
> DriilbitStatusListener and FragmentStatusListener to know about the health of 
> Drillbit node and a fragment respectively. The way root and non-root 
> fragments are setup by Foreman are different: 
> Root fragments are setup without any communication over control channel 
> (since it is executed locally on Foreman)
> Non-root fragments are setup by sending control message 
> (REQ_INITIALIZE_FRAGMENTS_VALUE) over wire. If there is failure in sending 
> any such control message (like due to network hiccup's) during query setup 
> then the query is failed and client is notified. 
> Each fragment is executed on it's node with the help Fragment Executor which 
> has an instance for FragmentStatusReporter. FragmentStatusReporter helps to 
> update the status of a fragment to Foreman node over a control tunnel or 
> connection using RPC message (REQ_FRAGMENT_STATUS) both for root and non-root 
> fragments. 
> Based on above when root fragment is submitted for setup then it is done 
> locally without any RPC communication whereas when status for that fragment 
> is reported by fragment executor that happens over control connection by 
> sending a RPC message. But for non-root fragment setup and status update both 
> happens using RPC message over control connection.
> *Issue 1:*
> What was observed is if for a simple query which has only 1 root fragment 
> running on Foreman node then setup will work fine. But as part of status 
> update when the fragment tries to create a control connection and fails to 
> establish that, then the query hangs. This is because the root fragment will 
> complete execution but will fail to update Foreman about it and Foreman think 
> that the query is running for ever. 
> *Proposed Solution:*
> For root fragment the setup of fragment is happening locally without RPC 
> message, so we can do the same for status update of root fragments. This will 
> avoid RPC communication for status update of fragments running locally on the 
> foreman and hence will resolve issue 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to