Quanlong Huang created IMPALA-14220:
---------------------------------------

             Summary: IsActive checks blocked by the getCatalogDelta operation 
when there are slow DDLs
                 Key: IMPALA-14220
                 URL: https://issues.apache.org/jira/browse/IMPALA-14220
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Catalog
            Reporter: Quanlong Huang


When catalogd HA is enabled, catalogd will check whther it's the active one 
before serving each request, i.e. in 
[AcceptRequest()|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L593]:
{code:cpp}
  Status AcceptRequest(CatalogServiceVersion::type client_version) {
    ...
    } else if (FLAGS_enable_catalogd_ha && !catalog_server_->IsActive()) {
      status = Status(Substitute("Request for Catalog service is rejected since 
"
          "catalogd $0 is in standby mode", server_address_));
    }
{code}
This check requires holding the catalog_lock_:
{code:cpp}
bool CatalogServer::IsActive() {
  lock_guard<mutex> l(catalog_lock_);
  return is_active_;
}{code}
[https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L896]

This lock is also held by 
[GatherCatalogUpdatesThread|https://github.com/apache/impala/blob/8d56eea72518aa11a36aa086dc8961bc8cdbd1fd/be/src/catalog/catalog-server.cc#L905]
 (a.k.a. topic update thread) which invokes JNI method GetCatalogDelta to 
collect catalog updates.

It's known that collecting catalog updates could be blocked by slow DDLs that 
holding the table lock for a long time (IMPALA-6671). The topic update thread 
usually waits for 1 minute (configured by topic_update_tbl_max_wait_time_ms / 
2) on the table lock and then skips it with a warning like this:
{noformat}
Table tpch.lineitem (version=2373, lastSeen=2373) is skipping topic update 
(2387, 2388] due to lock contention{noformat}
If the table hasn't been collected 3 consecutive times (configured by 
catalog_max_lock_skipped_topic_updates), topic update thread will wait 
infinitely on it in the next time.

So when the topic update thread is slow in collecting one round of updates, it 
holds the catalog_lock_ for a long time and blocks all new requests on this 
catalogd. This impacts performance on all queries that requires loading 
metadata from catalogd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to