[
https://issues.apache.org/jira/browse/IMPALA-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720081#comment-17720081
]
Joe McDonnell commented on IMPALA-12114:
----------------------------------------
Here is what is happening:
Our TSSLSocket is wrapped in a TBufferedTransport. The TBufferedTransport
implements peek() by calling read() on the underlying TSSLSocket (not peek()).
{noformat}
bool peek() override {
if (rBase_ == rBound_) {
setReadBuffer(rBuf_.get(), transport_->read(rBuf_.get(), rBufSize_));
}
return (rBound_ > rBase_);
}{noformat}
[https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TBufferTransports.h#L228-L233]
TSSLSocket has a field readRetryCount_. When we call in to TSSLSocket::read(),
either the read is successful and we zero out readRetryCount_ or it is not
successful and it bumps readRetryCount_. In our case, we hit the timeout, so
this is not a successful read and the counter is bumped for each peek() we do
on the TBufferedTransport.
{noformat}
bytes = SSL_read(ssl_, buf, len);
int32_t errno_copy = THRIFT_GET_SOCKET_ERROR;
int32_t error = SSL_get_error(ssl_, bytes);
readRetryCount_++;
if (error == SSL_ERROR_NONE) {
readRetryCount_ = 0;
break;
}{noformat}
[https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L425-L428]
When readRetryCount_ hits the limit (defaults to 5), we return 0 from the
TSSLSocket::read() call:
{noformat}
uint32_t TSSLSocket::read(uint8_t* buf, uint32_t len) {
...
int32_t bytes = 0;
while (readRetryCount_ < maxRecvRetries_) {
... the heart of the read logic, including the maintenance of
readRetryCount_ ...
}
return bytes;
}{noformat}
[https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L420-L421]
[https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/transport/TSSLSocket.cpp#L490-L491]
This causes peek() to return false, because the rBase_ == rBound_ and the read
was empty. Then we fall out of our loop because peek() returned 0.
> SSL Thrift connections disconnect if idle more than ~150 seconds
> ----------------------------------------------------------------
>
> Key: IMPALA-12114
> URL: https://issues.apache.org/jira/browse/IMPALA-12114
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.3.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
>
> A test cluster ran into issues with idle connections being disconnected when
> using SSL.
> This reproduces on my development environment with these steps:
> # Start Impala with SSL enabled
> {noformat}
> bin/start-impala-cluster.py
> --impalad_args="--ssl_client_ca_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_server_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_private_key=${IMPALA_HOME}/be/src/testutil/server-key.pem
> --hostname=localhost --idle_client_poll_period_s=30 -v=2"
> --state_store_args="--ssl_client_ca_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_server_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_private_key=${IMPALA_HOME}/be/src/testutil/server-key.pem
> --hostname=localhost"
> --catalogd_args="--ssl_client_ca_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_server_certificate=${IMPALA_HOME}/be/src/testutil/server-cert.pem
> --ssl_private_key=${IMPALA_HOME}/be/src/testutil/server-key.pem
> --hostname=localhost" --cluster_size=1{noformat}
> # Connect with impala-shell
> {noformat}
> impala-shell --ssl{noformat}
> # Leave this idle for 150+ seconds
> In the Impalad logs will be a statement like this:
> {noformat}
> I0503 22:11:53.233147 206554 impala-server.cc:2488] Connection
> 20470cb275a1d256:3d68601942f3179f from client 172.27.100.70:42540 to server
> hiveserver2-frontend closed. The connection had 2 associated
> session(s).{noformat}
> # Run a statement in impala-shell and will show that it needs to reconnect
> {noformat}
> default> show tables;
> Caught exception TSocket read 0 bytes, type=<class
> 'thrift.transport.TTransport.TTransportException'> in PingImpalaHS2Service.
> Caught exception [Errno 32] Broken pipe, type=<class 'socket.error'> in
> CloseSession.
> Warning: close session RPC failed: [Errno 32] Broken pipe, <class
> 'socket.error'>
> Connection lost, reconnecting...
> ... then it retries and succeeds{noformat}
> Tracing through the code, it appears that this peek() call returns false:
> {noformat}
> try {
> bytes_pending = input_->getTransport()->peek();
> break;
> } catch (const TTransportException& ttx) {{noformat}
> bytes_pending is false, and this causes the connection to be closed.
> This doesn't seem to impact Impala with older Thrift versions, so maybe
> something changed in Thrift 0.16.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]