[
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903974#comment-15903974
]
ASF GitHub Bot commented on DRILL-5316:
---------------------------------------
Github user sohami commented on a diff in the pull request:
https://github.com/apache/drill/pull/772#discussion_r105286121
--- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp ---
@@ -2143,6 +2146,9 @@ connectionStatus_t
PooledDrillClientImpl::connect(const char* connStr, DrillUser
Utils::shuffle(drillbits);
// The original shuffled order is maintained if we shuffle
first and then add any missing elements
Utils::add(m_drillbits, drillbits);
+ if (m_drillbits.empty()){
+ return handleConnError(CONN_FAILURE,
getMessage(ERR_CONN_ZKNODBIT));
--- End diff --
Since we are not removing the offline nodes from m_drillbits then I think
we should return connection error before shuffle. Let's say on first client
connection we get all the active node from zookeeper and store it in
m_drillbits. Then all the nodes went dead or offline. In the next connection
request, zookeeper will return zero drillbits but since m_drillbits is not
empty we will still try to connect and fail later.
Instead I think zero drillbits returned from zookeeper is a good indication
that we won't be able to connect to any other node already present inside
m_drillbits and should fail there itself ?
> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children
> completed with ZOK
> --------------------------------------------------------------------------------------------
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
> Issue Type: Bug
> Components: Client - C++
> Reporter: Rob Wu
> Priority: Critical
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would
> crash without any reason.
> A further look into the code revealed that during this call
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, &drillbitsVector);
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to
> crash
> Size check should be done to prevent this from happening
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)