[ 
https://issues.apache.org/jira/browse/IMPALA-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807575#comment-17807575
 ] 

Wenzhe Zhou commented on IMPALA-12699:
--------------------------------------

For thrift server, we use TAcceptQueueServer and repeatedly call 
[TAcceptQueueServer::Peek()|https://github.com/apache/impala/blob/master/be/src/rpc/TAcceptQueueServer.cpp#L87].
  
In TAcceptQueueServer::Peek(), we change[ receiving 
timeout|https://github.com/apache/impala/blob/master/be/src/rpc/TAcceptQueueServer.cpp#L146C15-L146C15]
 then call 
input_->[getTransport()->peek()|https://github.com/apache/impala/blob/master/be/src/rpc/TAcceptQueueServer.cpp#L153]
 
So the thrift server could be timeout if the connection is idle for long time.
But in thrift client side,  we don't have such timeout mechanism. We set 
[catalog_client_rpc_timeout_ms as 0 
|https://github.com/apache/impala/blob/master/be/src/runtime/exec-env.cc#L163] 
for Catalog Thrift RPC on client side. This make thrift client of Catalog 
Thrift RPC wait forever if the connection closing packet is lost. 
Agree we should change timeout value for GetPartialCatalogObject RPC.

> Coordinator should retry GetPartialCatalogObject request and apply a recv 
> timeout
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-12699
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12699
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Priority: Critical
>
> We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator 
> side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata 
> in local-catalog mode (see IMPALA-7534 or comments in 
> CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata 
> (e.g. db list or table list of a db) could block other queries.
> We have also seen thrift RPCs hanging in IMPALA-3575. In fact, 
> GetPartialCatalogObject RPCs are read-only requests. They can be cleanly 
> retried. We should consider using a dedicated catalogd client cache for 
> GetPartialCatalogObject requests and set an appropriate timeout for the 
> socket.
> The current catalogd client cache:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
> The related flags:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167
> CC [~wzhou]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to