[ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903974#comment-15903974
 ] 

ASF GitHub Bot commented on DRILL-5316:
---------------------------------------

Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/772#discussion_r105286121
  
    --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
    @@ -2143,6 +2146,9 @@ connectionStatus_t 
PooledDrillClientImpl::connect(const char* connStr, DrillUser
                 Utils::shuffle(drillbits);
                 // The original shuffled order is maintained if we shuffle 
first and then add any missing elements
                 Utils::add(m_drillbits, drillbits);
    +            if (m_drillbits.empty()){
    +                return handleConnError(CONN_FAILURE, 
getMessage(ERR_CONN_ZKNODBIT));
    --- End diff --
    
    Since we are not removing the offline nodes from m_drillbits then I think 
we should return connection error before shuffle. Let's say on first client 
connection we get all the active node from zookeeper and store it in 
m_drillbits. Then all the nodes went dead or offline. In the next connection 
request, zookeeper will return zero drillbits but since m_drillbits is not 
empty we will still try to connect and fail later. 
    
    Instead I think zero drillbits returned from zookeeper is a good indication 
that we won't be able to connect to any other node already present inside 
m_drillbits and should fail there itself ?


> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> --------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5316
>                 URL: https://issues.apache.org/jira/browse/DRILL-5316
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - C++
>            Reporter: Rob Wu
>            Priority: Critical
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, &drillbitsVector); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to