[
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17626390#comment-17626390
]
ASF subversion and git services commented on IMPALA-11674:
----------------------------------------------------------
Commit f917dc111c65dfe7142361660e877d20c08f875e in impala's branch
refs/heads/dependabot/maven/fe/org.eclipse.jetty-jetty-server-9.4.41.v20210516
from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f917dc111 ]
IMPALA-11674: Fix timeout detection for TSSLSocket
Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in
be/src/rpc/thrift-util.cc make assumption about the implementation of
read(), peek(), write() and write_partial() in TSocket.cpp and
TSSLSocket.cpp. The functions read() and peek() in TSSLSocket.cpp were
changed in version 0.11.0 and 0.16.0 to throw different exception for
timeout. This cause IsPeekTimeoutTException() and
IsReadTimeoutTException() to return wrong value after upgrade thrift,
which in turn cause TAcceptQueueServer::Peek() to rethrow the exception
to caller TAcceptQueueServer::run() and make TAcceptQueueServer::run()
to close the connection, ignoring idle_session_timeout query option.
The issue was reproducible through the following scenario:
1. From the local development environment, start the impala cluster with
SSL enabled and idle_client_poll_period_s equals 5 seconds.
export CERT_DIR="$IMPALA_HOME/be/src/testutil"
export SSL_ARGS="--ssl_client_ca_certificate=$CERT_DIR/server-cert.pem
--ssl_server_certificate=$CERT_DIR/server-cert.pem
--ssl_private_key=$CERT_DIR/server-key.pem
--hostname=localhost"
./bin/start-impala-cluster.py --state_store_args="$SSL_ARGS" \
--catalogd_args="$SSL_ARGS" \
--impalad_args="$SSL_ARGS --idle_client_poll_period_s=5"
2. Run impala-shell with a higher idle_session_timeout query option
impala-shell.sh --ssl -Q idle_session_timeout=100
3. Run a simple query like "show databases" and rerun it after 15
seconds pass.
The second query run will fail with the following error message in impala-shell:
[localhost:21050] default> show databases;
Caught exception TLS/SSL connection has been closed (EOF) (_ssl.c:1829),
type=<class 'ssl.SSLZeroReturnError'> in CloseSession.
Warning: close session RPC failed: TLS/SSL connection has been closed (EOF)
(_ssl.c:1829), <class 'ssl.SSLZeroReturnError'>
This patch fix the expected error message in IsReadTimeoutTException and
IsPeekTimeoutTException to correctly detect timeout error from
TSSLSocket. Additionally, this patch also fix typo in
NEW_THRIFT_VERSION_MSG.
Testing:
- Redo the scenario manually, with and without SSL, and confirm that
the second query complete without error.
- Add test_thrift_socket.py to begin verifying IsPeekTimeoutTException
function.
Change-Id: I6ad168a1c96d751a3c50d924e6ecaf6404e589ab
Reviewed-on: http://gerrit.cloudera.org:8080/19157
Reviewed-by: Wenzhe Zhou <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Zoltan Borok-Nagy <[email protected]>
> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -------------------------------------------------------------------------
>
> Key: IMPALA-11674
> URL: https://issues.apache.org/jira/browse/IMPALA-11674
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 4.2.0
> Reporter: Wenzhe Zhou
> Assignee: Riza Suminto
> Priority: Major
> Fix For: Impala 4.2.0
>
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0.
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(),
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0
> and 0.16.0 to throw different exception for timeout. This cause
> IsPeekTimeoutTException() and IsReadTimeoutTException() to return wrong value
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to
> rethrow the exception to caller TAcceptQueueServer::run() and make
> TAcceptQueueServer::run() to close the connection.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]