Hi,
I have been looking into an issue where the native client hangs after failing to handshake during connection. We have some boundary conditions where a connection is attempted while the drill service is starting resulting in handshake failures.

--

DrillClient::connect() ends up ultimately in DrillClientImpl::recvHandshake().

This function calls async_read() with a callback to DrillClientImpl::handleHandshake to handle the results of handshaking. However, on error, DrillClientImpl::handleHandshake ends up calling handleConnError() which merely calls shutdownSocket() to kill the socket and set m_bIsConnected to false. When all that unfolds, DrillClientImpl::validateHandshake() ends up returning CONN_SUCCESS which is clearly wrong because the handshake failed.

The original caller to DrillClient::connect() thinks everything is hunky-dorey.

The following added to DrillClientImpl::recvHandshake() after the m_io_service.run() line to check the result seems to fix the problem:

    if (!m_bIsConnected)
    {
        return CONN_HANDSHAKE_FAILED;
    }

...but I wonder if there is a better solution.

Also, although there is the IsActive() member that applications can call to determine if the connection is up, I wonder if the front-line drill calls (such as submitQuery()) could check the connection status a matter of course. Currently, if you attempt a submitQuery() call when the connection is down, it just hangs because m_io_service is not running and m_deadlineTimer never triggers as a fall back.

Opinions?

Cheers,
Ralph Little

Reply via email to