Quanlong Huang created IMPALA-9214:
--------------------------------------
Summary: REFRESH with sync_ddl may fail with concurrent INVALIDATE
METADATA
Key: IMPALA-9214
URL: https://issues.apache.org/jira/browse/IMPALA-9214
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
The call trace for executing a REFRESH statement in Catalogd is
{code:java}
JniCatalog#resetMetadata
CatalogOpExecutor#execResetMetadata
CatalogServiceCatalog#reloadTable
CatalogServiceCatalog#waitForSyncDdlVersion
{code}
In CatalogServiceCatalog#reloadTable(), the {{Tbl}} object may be stale if
there's a concurrent reset, i.e. INVALIDATE METADATA, running. Then
{{CatalogServiceCatalog#reloadTable}} will return the thrift object of a stale
Table. It can't be found in the catalog cache and the {{topicUpdateLog_}}, so
{{waitForSyncDdlVersion}} will finally hang or run out of attempts.
Here is an example. Let's say table1 is an unpartitioned table and is loaded.
Two queries, "Refresh table1" and "Invalidate metadata" are running
concurrently.
Thread-1 (Refresh):
* Gets the {{Table}} object in CatalogServiceCatalog#execResetMetadata and
goes into {{reloadTable}}. The catalog version of table1 is 50.
* Waiting for both version lock and table lock here:
[https://github.com/apache/impala/blob/a1588e44980c648cb7f9263cbd0409abfbaeacf7/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L2023]
Thread-2 (Invalidate Metadata):
* Holds the version lock and replace the whole catalog cache with a new one.
Makes all existing catalog objects stale. Now the catalog version of table1 is
90.
* Release the version lock.
Thread-1 (Refresh):
* Gets the version lock and table lock
* Get a new catalog version, let's say 100. Then release version lock.
* Load the metadata into the stale Table object. Bump its catalog version from
50 to 100.
* Return the thrift object of the updated stale object from {{reloadTable}}
* Goes into {{waitForSyncDdlVersion}}. Wait for an update of table1 is sent
and the sent version >= 100.
However, table1 in the catalog cache is with version 90. Unless there's another
update on this table, Thread-1 will hang or run out of attempts for waiting the
expected update.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]