Github user sohami commented on a diff in the pull request: https://github.com/apache/drill/pull/772#discussion_r105286121 --- Diff: contrib/native/client/src/clientlib/drillClientImpl.cpp --- @@ -2143,6 +2146,9 @@ connectionStatus_t PooledDrillClientImpl::connect(const char* connStr, DrillUser Utils::shuffle(drillbits); // The original shuffled order is maintained if we shuffle first and then add any missing elements Utils::add(m_drillbits, drillbits); + if (m_drillbits.empty()){ + return handleConnError(CONN_FAILURE, getMessage(ERR_CONN_ZKNODBIT)); --- End diff -- Since we are not removing the offline nodes from m_drillbits then I think we should return connection error before shuffle. Let's say on first client connection we get all the active node from zookeeper and store it in m_drillbits. Then all the nodes went dead or offline. In the next connection request, zookeeper will return zero drillbits but since m_drillbits is not empty we will still try to connect and fail later. Instead I think zero drillbits returned from zookeeper is a good indication that we won't be able to connect to any other node already present inside m_drillbits and should fail there itself ?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---