[
https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750257#action_12750257
]
Ben Manes commented on CASSANDRA-405:
-------------------------------------
When performing a postmortem on this issue, please review how the
ConcurrentLinkedHashMap was added. The project page stated:
> Note: The algorithm needs further testing and is not deemed production ready.
> It is functional under concurrent tests, but needs additional load testing to
> assert correctness.
That load testing, provided in the standard unit test runs, uncovered the issue
and thus it was not promoted to a release status. I haven't had time in the
last few months to work on this project, but even the last check-in notes that
its leaving debug code to help resolve it later. The project states on the
front page and FAQ that the goal is more educational than formal usage, hence I
avoided known algorithms (which would be the correct approach if it was
work-related).
The ConcurrentLRUCache uses a watermark approach which is valid, but suffers
from stampeding and is an offline algorithm. Its still an excellent approach
and one of many possibilities described in the FAQ. I am personally a fan of
soft-reference based caching for global data, which is evicted in LRU order,
because it allows the GC to manage what it does best (memory!) and promotes not
overburdening the application server.
Please treat this as an issue where the blame is both 3p as I did not stress
heavily enough not to use this in production and internal for not evaluating a
3p project enough to recognize that it warned about its production status. I
will update the project page to better communicate and provide a performant
modification that is thread-safe for those that need a solution. Please
re-evaluate your own internal processes to determine why the bad call was made.
I am not trying to shift blame, but my pet peeve is when firefighting
production and no one learns because then it just happens again. Its very
frustrating, even more so if I actually work there! ;-)
Cheers!
Ben
> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
> Key: CASSANDRA-405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-405
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.4
> Reporter: Chris Goffinet
> Assignee: Jonathan Ellis
> Fix For: 0.4
>
> Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using
> appendToTail. We could remove the ConcurrentLinkedHashMap for now until
> that's resolved.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.