Quanlong Huang created IMPALA-12699:
---------------------------------------
Summary: Coordinator should retry GetPartialCatalogObject request
and apply a recv timeout
Key: IMPALA-12699
URL: https://issues.apache.org/jira/browse/IMPALA-12699
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
We have seen trivial GetPartialCatalogObject RPCs hanging in coordinator side,
e.g. IMPALA-11409. Due to the piggyback mechanism of fetching metadata in
local-catalog mode (see IMPALA-7534 or comments in
CatalogdMetaProvider#loadWithCaching()), a hanging RPC on shared metadata (e.g.
db list or table list of a db) could block other queries.
We have also seen thrift RPCs hanging in IMPALA-3575. In fact,
GetPartialCatalogObject RPCs are read-only requests. They can be cleanly
retried. We should consider using a dedicated catalogd client cache for
GetPartialCatalogObject requests and set an appropriate timeout for the socket.
The current catalogd client cache:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L224-L226
The related flags:
https://github.com/apache/impala/blob/cdac777c51febc99500b8426c2b3aabc7e9addd7/be/src/runtime/exec-env.cc#L161-L167
CC [~wzhou]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)