[ https://issues.apache.org/jira/browse/IMPALA-12699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810533#comment-17810533 ]
Wenzhe Zhou commented on IMPALA-12699: -------------------------------------- There is only one client cache object for catalog service on each impalad, which is created with FLAGS_catalog_client_rpc_timeout_ms as recv and send timeout. https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226 The catalog client cache object is initialized by calling catalogd_client_cache_->InitMetrics(metrics_.get(), "catalog.server"); https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L430 If we want to apply a recv_timeout for GetPartialCatalogObject RPC only, then we have to create a separate catalogd client cache object. How do we set metrics for this additional client cache object? Currently the metrics names are fixed. https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/client-cache.cc#L236-L249 Use a different key_prefix? > Coordinator should retry GetPartialCatalogObject request and apply a recv > timeout > --------------------------------------------------------------------------------- > > Key: IMPALA-12699 > URL: https://issues.apache.org/jira/browse/IMPALA-12699 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator > side, e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata > in local-catalog mode (see IMPALA-7534 or comments in > CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata > (e.g. db list or table list of a db) could block other queries. > We have also seen thrift RPCs hanging in IMPALA-3575. In fact, > GetPartialCatalogObject RPCs are read-only requests. They can be cleanly > retried. We should consider using a dedicated catalogd client cache for > GetPartialCatalogObject requests and set an appropriate timeout for the > socket. > The current catalogd client cache: > https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226 > The related flags: > https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167 > CC [~wzhou] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org