[
https://issues.apache.org/jira/browse/IMPALA-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869176#comment-16869176
]
ASF subversion and git services commented on IMPALA-7534:
---------------------------------------------------------
Commit 8431a95698b6f687ac8862cc6549e1949af0b034 in impala's branch
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8431a95 ]
IMPALA-7534. Handle invalidation races in CatalogdMetaProvider
This handles a race condition in which a cache invalidation concurrent
with a cache load would potentially be skipped, causing out-of-date data
to persist in the cache. This would present itself as spurious "table
not found" errors.
A new test case triggers the issue reliably by injecting latency into
the metadata fetch RPC and running DDLs concurrently on the same
database across 8 threads. With the fix, the test passes reliably.
Another option to fix this might have been to switch to Caffeine instead
of Guava's loading cache. However, Caffeine requires Java 8, and
LocalCatalog is being backported to Impala 2.x which still can run on
Java 7. So, working around the Guava issue will make backporting (and
future backports) easier.
Change-Id: I70f377db88e204825a909389f28dc3451815235c
Reviewed-on: http://gerrit.cloudera.org:8080/13664
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Handle invalidation races in CatalogdMetaProvider cache
> -------------------------------------------------------
>
> Key: IMPALA-7534
> URL: https://issues.apache.org/jira/browse/IMPALA-7534
> Project: IMPALA
> Issue Type: Sub-task
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
> Fix For: Not Applicable
>
>
> There is a well-known race in Guava's LoadingCache that we are using for
> CatalogdMetaProvider which we are not currently handling:
> - thread 1 gets a cache miss and makes a request to fetch some data from the
> catalogd. It fetches the catalog object with version 1 and then gets context
> switched out or otherwise slow
> - thread 2 receives an invalidation for the same object, because it has
> changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached.
> - thread 1 puts back v1 of the object into the cache
> In essence we've "missed" an invalidation. This is also described in this
> nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/
> The race is quite unlikely but could cause some unexpected results that are
> hard to reason about, so we should look into a fix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]