[ 
https://issues.apache.org/jira/browse/IMPALA-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17619734#comment-17619734
 ] 

Wenzhe Zhou edited comment on IMPALA-11674 at 10/18/22 6:59 PM:
----------------------------------------------------------------

TSSLSocket::peek() in TSSLSocket.cpp was changed to call 
TSSLSocket::waitForEvent(). When TSSLSocket::waitForEvent() call THRIFT_POLL 
(which is poll() function on Linux) with positive timeout value, THRIFT_POLL 
return 0 when the call is timed out (https://linux.die.net/man/2/poll). So that 
 TSSLSocket::waitForEvent()  throw exception 
TTransportException(TTransportException::TIMED_OUT, "THRIFT_POLL (timed out)") 
for timeout.
IsReadTimeoutTException() and IsPeekTimeoutTException() should be updated to 
check new type of exception. Otherwise the functions return wrong values for 
timeout, which cause TAcceptQueueServer::Peek() to rethrow the exception to 
caller TAcceptQueueServer::run(). TAcceptQueueServer::run() then will write log 
message "AcceptQueueServer client died: THRIFT_POLL (timed out)", and close the 
connection.

In one reported case, client thrift connections were closed after 30 seconds 
with lots of log message "AcceptQueueServer client died: THRIFT_POLL (timed 
out)" in coordinator log file. The behavior was matching above code analysis.

[~rizaon] Please verify if my code analysis make sense. I think we have same 
issue for Thrift 0.11.0. 
cc: [~joemcdonnell] 


was (Author: wzhou):
TSSLSocket::peek() in TSSLSocket.cpp was changed to TSSLSocket::waitForEvent(). 
When TSSLSocket::waitForEvent() call THRIFT_POLL (which is poll() function on 
Linux) with positive timeout value, THRIFT_POLL return 0 when the call is timed 
out (https://linux.die.net/man/2/poll). So that  TSSLSocket::waitForEvent()  
throw exception TTransportException(TTransportException::TIMED_OUT, 
"THRIFT_POLL (timed out)") for timeout.
IsReadTimeoutTException() and IsPeekTimeoutTException() should be updated to 
check new type of exception. Otherwise the functions return wrong values for 
timeout, which cause TAcceptQueueServer::Peek() to rethrow the exception to 
caller TAcceptQueueServer::run(). TAcceptQueueServer::run() will write log 
message "AcceptQueueServer client died: THRIFT_POLL (timed out)", then close 
the connection.

In one reported case, client thrift connections were closed after 30 seconds 
with lots of log message "AcceptQueueServer client died: THRIFT_POLL (timed 
out)" in coordinator log file. The behavior was matching above code analysis.

[~rizaon] Please verify if my code analysis make sense. I think we have same 
issue for Thrift 0.11.0. 
cc: [~joemcdonnell] 

> Fix IsPeekTimeoutTException and IsReadTimeoutTException for thrift-0.16.0
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-11674
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11674
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.2.0
>            Reporter: Wenzhe Zhou
>            Assignee: Riza Suminto
>            Priority: Major
>
> IMPALA-7825 upgraded Thrift version from 0.9.3 to 0.11.0, IMPALA-11384 
> upgraded CPP Thrift components from 0.11.0 to Thrift-0.16.0. 
> Functions IsPeekTimeoutTException() and IsReadTimeoutTException() in 
> be/src/rpc/thrift-util.cc make assumption about the implementation of read(), 
> peek(), write() and write_partial() in TSocket.cpp and TSSLSocket.cpp. The 
> functions read() and peek() in TSSLSocket.cpp were changed in version 0.11.0 
> and 0.16.0 to throw different exception for timeout. This cause 
> IsPeekTimeoutTException() and IsReadTimeoutTException() return wrong value 
> after upgrade thrift, which in turn cause TAcceptQueueServer::Peek() to 
> rethrow the exception to caller TAcceptQueueServer::run() and make 
> TAcceptQueueServer::run() to close the connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to