[
https://issues.apache.org/jira/browse/IMPALA-14856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18069035#comment-18069035
]
ASF subversion and git services commented on IMPALA-14856:
----------------------------------------------------------
Commit 957c398ad08930878b830a1eb1c1e6138a2a6929 in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=957c398ad ]
IMPALA-12715: deflake test_allow_metadata_update_local_catalog
The test expects DESCRIBE on an unloaded table can load the table meta
in coordinator side. However, this can't be guaranteed in the local
catalog mode since catalog updates(invalidations) from statestore could
invalidate the cache (IMPALA-14856).
The test is flaky since DESCRIBE uses the table meta twice, one in
analyze phase (StmtMetadataLoader) and the other in execution
(Frontend.doDescribeTable()). If the statestore update arrives between
them, the second usage can load the table meta again so the cache item
exists. However, if the statestore update arrives after the second
usage, the cache item is removed so the test fails.
This patch deflakes the test by using REFRESH on the unloaded table to
trigger metadata loading in catalogd side. Run it with sync_ddl=true to
make sure coordinator processes the statestore update. Then run a
DESCRIBE to make sure the table meta is loaded in coordinator side.
Tests:
- ran the test locally 200 times.
- ran test_ranger.py
Change-Id: I538c30bf4d1439108bd1cd1cb64208281974c1f6
Reviewed-on: http://gerrit.cloudera.org:8080/24135
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Michael Smith <[email protected]>
> Unnecessary invalidation for just loaded table meta
> ---------------------------------------------------
>
> Key: IMPALA-14856
> URL: https://issues.apache.org/jira/browse/IMPALA-14856
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog, Frontend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> In local catalog mode, running a query on an unloaded table can't make sure
> its metadata is loaded in coordinator side. This is due to a race between two
> metadata flows:
> # Coordinator uses getPartialCatalogObject RPC to load the metadata from
> catalogd. Once coordinator gets the response, the table meta is cached.
> # getPartialCatalogObject RPC on the unloaded table actually triggers
> catalogd to load metadata of it. When the loading is done, catalogd bumps the
> table version. The new table version will then be collected in the statestore
> update, and send to all coordinators.
> For the coordinator that triggers the request, it will invalidate the table
> meta when receiving the statestore update, regradless what the table version
> is.
> {code:java}
> private void invalidateCacheForTable(String dbName, String tblName,
> List<String> invalidated) {
> TableCacheKey key = new TableCacheKey(dbName.toLowerCase(),
> tblName.toLowerCase());
> if (cache_.asMap().remove(key) != null) {
> invalidated.add("table " + dbName + "." + tblName);
> }
> }{code}
> [https://github.com/apache/impala/blob/20220fb9232b94d228383fe693a383d2c71a4733/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L1843-L1844]
> Ideally the invalidation should check the cached table version and only
> invalidate the item when it's older than the statestore update. However,
> there are two problems:
> * Checking the cached version using getIfPresent() and removing the item is
> not atomic. Another thread could put a newer version in between and got
> invalidated unintentionally.
> * The value could be a CompletableFuture loading the metadata from catalogd.
> We should remove it only when it's loading an older version. How to get the
> version it's loading is also a problem.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]