IMPALA-3875: Thrift threaded server hang in some cases We use socket timeouts for our backend connections. We set these timeouts only after we've open()'d the connection, which ideally should be fine. However, our TSaslTransport stack does read()'s and write()'s over the network on an open(), which means that on a secure cluster we send and recieve non-TCP-handshake packets on open(). This is because the current code tries to establish a SASL handshake during open().
If for any reason the peer server does not respond to the read()'s during the open() call (after connect() is successful), the client will wait on read() indefinitely. This patch sets the socket timeout before we call open(), so that the read()'s and write()'s during the open() are subject to the timeout as well. We should also consider making a larger change where this SASL handshake does not take place during an open(), but instead after the open() call is completed, so as to have the open() semantics be the same for both secure and insecure clusters. Change-Id: I6c8f91a88f723e0e58e81bb385c5a8f190021868 Reviewed-on: http://gerrit.cloudera.org:8080/5263 Reviewed-by: Sailesh Mukil <[email protected]> Tested-by: Internal Jenkins Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/ff629c2d Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/ff629c2d Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/ff629c2d Branch: refs/heads/master Commit: ff629c2deb97b4ef25e80745bf4689dcbe8407fe Parents: 7efa083 Author: Sailesh Mukil <[email protected]> Authored: Tue Nov 29 13:12:48 2016 -0800 Committer: Internal Jenkins <[email protected]> Committed: Sat Dec 3 23:21:10 2016 +0000 ---------------------------------------------------------------------- be/src/rpc/thrift-client.h | 2 ++ be/src/runtime/client-cache.cc | 15 +++++++++++---- be/src/statestore/statestore-test.cc | 6 +----- 3 files changed, 14 insertions(+), 9 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff629c2d/be/src/rpc/thrift-client.h ---------------------------------------------------------------------- diff --git a/be/src/rpc/thrift-client.h b/be/src/rpc/thrift-client.h index fae193a..a194551 100644 --- a/be/src/rpc/thrift-client.h +++ b/be/src/rpc/thrift-client.h @@ -67,6 +67,8 @@ class ThriftClientImpl { /// Set send timeout on the underlying TSocket. void setSendTimeout(int32_t ms) { socket_->setSendTimeout(ms); } + Status socket_create_status() { return socket_create_status_; } + protected: ThriftClientImpl(const std::string& ipaddress, int port, bool ssl) : address_(MakeNetworkAddress(ipaddress, port)), ssl_(ssl) { http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff629c2d/be/src/runtime/client-cache.cc ---------------------------------------------------------------------- diff --git a/be/src/runtime/client-cache.cc b/be/src/runtime/client-cache.cc index 578ef29..4f403ad 100644 --- a/be/src/runtime/client-cache.cc +++ b/be/src/runtime/client-cache.cc @@ -110,15 +110,22 @@ Status ClientCacheHelper::CreateClient(const TNetworkAddress& address, ClientFactory factory_method, ClientKey* client_key) { shared_ptr<ThriftClientImpl> client_impl(factory_method(address, client_key)); VLOG(2) << "CreateClient(): creating new client for " << client_impl->address(); - Status status = client_impl->OpenWithRetry(num_tries_, wait_ms_); - if (!status.ok()) { - *client_key = NULL; - return status; + + if (!client_impl->socket_create_status().ok()) { + *client_key = nullptr; + return client_impl->socket_create_status(); } + // Set the TSocket's send and receive timeouts. client_impl->setRecvTimeout(recv_timeout_ms_); client_impl->setSendTimeout(send_timeout_ms_); + Status status = client_impl->OpenWithRetry(num_tries_, wait_ms_); + if (!status.ok()) { + *client_key = nullptr; + return status; + } + // Because the client starts life 'checked out', we don't add it to its host cache. { lock_guard<mutex> lock(client_map_lock_); http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/ff629c2d/be/src/statestore/statestore-test.cc ---------------------------------------------------------------------- diff --git a/be/src/statestore/statestore-test.cc b/be/src/statestore/statestore-test.cc index 2c518a1..8c81383 100644 --- a/be/src/statestore/statestore-test.cc +++ b/be/src/statestore/statestore-test.cc @@ -94,8 +94,4 @@ TEST(StatestoreSslTest, SmokeTest) { } -int main(int argc, char **argv) { - InitCommonRuntime(argc, argv, false); - ::testing::InitGoogleTest(&argc, argv); - return RUN_ALL_TESTS(); -} +IMPALA_TEST_MAIN();
