[ 
https://issues.apache.org/jira/browse/IMPALA-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807981#comment-17807981
 ] 

Wenzhe Zhou edited comment on IMPALA-12699 at 1/18/24 3:35 AM:
---------------------------------------------------------------

It's great to reproduce the half open issue.
keepAlive can help to detect the half open connection. When keepAlive is 
enabled for the TCP connection, client in TCP layer will periodically send 
keepalive packets to other end. If other end already closed the connection, it 
will not receive response hence close the connection. See 
https://www.codeproject.com/Articles/37490/Detection-of-Half-Open-Dropped-TCP-IP-Socket-Conne
  


was (Author: wzhou):
It's great to reproduce the half open issue.
keepAlive can help to detect the half open connection. When keepAlive is 
enabled for the TCP connection, client in TCP layer will periodically send 
"null" message to other end. If other end already closed the connection, it 
will not receive response hence close the connection. See 
https://www.codeproject.com/Articles/37490/Detection-of-Half-Open-Dropped-TCP-IP-Socket-Conne
  

> Coordinator should retry GetPartialCatalogObject request and apply a recv 
> timeout
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-12699
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12699
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Critical
>
> We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator 
> side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata 
> in local-catalog mode (see IMPALA-7534 or comments in 
> CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata 
> (e.g. db list or table list of a db) could block other queries.
> We have also seen thrift RPCs hanging in IMPALA-3575. In fact, 
> GetPartialCatalogObject RPCs are read-only requests. They can be cleanly 
> retried. We should consider using a dedicated catalogd client cache for 
> GetPartialCatalogObject requests and set an appropriate timeout for the 
> socket.
> The current catalogd client cache:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
> The related flags:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167
> CC [~wzhou]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to