[
https://issues.apache.org/jira/browse/SOLR-10141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872983#comment-15872983
]
Ben Manes edited comment on SOLR-10141 at 2/18/17 5:08 AM:
-----------------------------------------------------------
Thanks!!! I think I found the bug. It now passes your test case.
The problem was due to put() stampeding over the value during the eviction. The
[eviction
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
#* resurrected if a race occurred (e.g. was thought expired, but newly accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener
This failed because
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
can perform its update outside of a hash table lock (e.g. a computation). It
synchronizes on the entry to update, checking first if it was still alive. This
resulted in a race where the entry was removed from the hash table, the value
updated, and entry marked as dead. When the listener was notified, it received
the wrong value.
The solution I have now is to expand the synchronized block on eviction. This
passes your test and should be cheap. I'd like to review it a little more and
incorporate your test into my suite.
This is an excellent find. I've stared at the code many times and the race
seems obvious in hindsight.
was (Author: ben.manes):
Thanks!!! I think I found the bug. It now passes your test case.
The problem was due to put() stampeding over the value during the eviction. The
[eviction
routine|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L725]
performed the following:
# Read the key, value, etc
# Conditionally removed in a computeIfPresent() block
- resurrected if a race occurred (e.g. was thought expired, but newly
accessed)
# Mark the entry as "dead" (using a synchronized (entry) block)
# Notify the listener
This failed because
[putFast|https://github.com/ben-manes/caffeine/blob/65e3efd4b50613c27567ff594877d0f63acfbce2/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java#L1521]
can perform its update outside of a hash table lock (e.g. a computation). It
synchronizes on the entry to update, checking first if it was still alive. This
resulted in a race where the entry was removed from the hash table, the value
updated, and entry marked as dead. When the listener was notified, it received
the wrong value.
The solution I have now is to expand the synchronized block on eviction. This
passes your test and should be cheap. I'd like to review it a little more and
incorporate your test into my suite.
This is an excellent find. I've stared at the code many times and the race
seems obvious in hindsight.
> Caffeine cache causes BlockCache corruption
> --------------------------------------------
>
> Key: SOLR-10141
> URL: https://issues.apache.org/jira/browse/SOLR-10141
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Yonik Seeley
> Attachments: SOLR-10141.patch, Solr10141Test.java
>
>
> After fixing the race conditions in the BlockCache itself (SOLR-10121), the
> concurrency test passes with the previous implementation using
> ConcurrentLinkedHashMap and fail with Caffeine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]