Quanlong Huang created IMPALA-12699:
---------------------------------------

             Summary: Coordinator should retry GetPartialCatalogObject request 
and apply a recv timeout
                 Key: IMPALA-12699
                 URL: https://issues.apache.org/jira/browse/IMPALA-12699
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang


We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator side, 
e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata in 
local-catalog mode (see IMPALA-7534 or comments in 
CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata (e.g. 
db list or table list of a db) could block other queries.

We have also seen thrift RPCs hanging in IMPALA-3575. In fact, 
GetPartialCatalogObject RPCs are read-only requests. They can be cleanly 
retried. We should consider using a dedicated catalogd client cache for 
GetPartialCatalogObject requests and set an appropriate timeout for the socket.

The current catalogd client cache:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
The related flags:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167

CC [~wzhou]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to