[ 
https://issues.apache.org/jira/browse/IMPALA-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740937#comment-16740937
 ] 

Paul Rogers commented on IMPALA-7534:
-------------------------------------

Let's now focus on the issue described in the description and blog post: the 
simultaneous invalidate and load operation. If the cached objects are 
immutable, there is no harm in lack of synchronization.

If the objects are mutable, there are larger issues: if code reads 10 
partitions, how does it ensure that all 10 are from the same version?

As noted above, the code uses version numbers. Since the object within a 
version is immutable, there is no harm in non-synchronized invalidate 
operations. On the other hand, if versions are just labels, then the object 
itself is mutable, and can change from one version to another at any time. The 
loading cache is a poor fit for this use case.

Plus, the loading cache does not handle the case of the 10 partitions. In the 
worst case, each is perfectly synchronized, but because the overall fetch is 
not, we end up with, say, three distinct versions that we've read.

So, we could go ahead and replace the cache. But, it seems we have deeper 
concurrency issues.

The general solution would be MVCC: a client that starts reading version 10 of 
partitions will do the entire read at version 10. Concurrently, a thread that 
starts just a bit later will do the entire read at version 11.

In this way we solve the concurrency issue not just in the narrow sense of 
invalidation, but in the broader sense of giving the client (the planner) a 
consistent view of the data.

> Handle invalidation races in CatalogdMetaProvider cache
> -------------------------------------------------------
>
>                 Key: IMPALA-7534
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7534
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Assignee: Paul Rogers
>            Priority: Major
>
> There is a well-known race in Guava's LoadingCache that we are using for 
> CatalogdMetaProvider which we are not currently handling:
> - thread 1 gets a cache miss and makes a request to fetch some data from the 
> catalogd. It fetches the catalog object with version 1 and then gets context 
> switched out or otherwise slow
> - thread 2 receives an invalidation for the same object, because it has 
> changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached.
> - thread 1 puts back v1 of the object into the cache
> In essence we've "missed" an invalidation. This is also described in this 
> nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/
> The race is quite unlikely but could cause some unexpected results that are 
> hard to reason about, so we should look into a fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to