[ 
https://issues.apache.org/jira/browse/IMPALA-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810533#comment-17810533
 ] 

Wenzhe Zhou commented on IMPALA-12699:
--------------------------------------

There is only one client cache object for catalog service on each impalad, 
which is created with FLAGS_catalog_client_rpc_timeout_ms as recv and send 
timeout.
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
The catalog client cache object is initialized by calling  
catalogd_client_cache_->InitMetrics(metrics_.get(), "catalog.server");
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L430

If we want to apply a recv_timeout for GetPartialCatalogObject RPC only, then 
we have to create a separate catalogd client cache object. How do we set 
metrics for this additional client cache object? Currently the metrics names 
are fixed.
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/client-cache.cc#L236-L249
Use a different key_prefix? 



> Coordinator should retry GetPartialCatalogObject request and apply a recv 
> timeout
> ---------------------------------------------------------------------------------
>
>                 Key: IMPALA-12699
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12699
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator 
> side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata 
> in local-catalog mode (see IMPALA-7534 or comments in 
> CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata 
> (e.g. db list or table list of a db) could block other queries.
> We have also seen thrift RPCs hanging in IMPALA-3575. In fact, 
> GetPartialCatalogObject RPCs are read-only requests. They can be cleanly 
> retried. We should consider using a dedicated catalogd client cache for 
> GetPartialCatalogObject requests and set an appropriate timeout for the 
> socket.
> The current catalogd client cache:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
> The related flags:
> https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167
> CC [~wzhou]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to