[ 
https://issues.apache.org/jira/browse/CASSANDRA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750257#action_12750257
 ] 

Ben Manes commented on CASSANDRA-405:
-------------------------------------

When performing a postmortem on this issue, please review how the 
ConcurrentLinkedHashMap was added.  The project page stated:

> Note: The algorithm needs further testing and is not deemed production ready. 
> It is functional under concurrent tests, but needs additional load testing to 
> assert correctness.

That load testing, provided in the standard unit test runs, uncovered the issue 
and thus it was not promoted to a release status.  I haven't had time in the 
last few months to work on this project, but even the last check-in notes that 
its leaving debug code to help resolve it later.  The project states on the 
front page and FAQ that the goal is more educational than formal usage, hence I 
avoided known algorithms (which would be the correct approach if it was 
work-related).

The ConcurrentLRUCache uses a watermark approach which is valid, but suffers 
from stampeding and is an offline algorithm.  Its still an excellent approach 
and one of many possibilities described in the FAQ.  I am personally a fan of 
soft-reference based caching for global data, which is evicted in LRU order, 
because it allows the GC to manage what it does best (memory!) and promotes not 
overburdening the application server.

Please treat this as an issue where the blame is both 3p as I did not stress 
heavily enough not to use this in production and internal for not evaluating a 
3p project enough to recognize that it warned about its production status.  I 
will update the project page to better communicate and provide a performant 
modification that is thread-safe for those that need a solution.  Please 
re-evaluate your own internal processes to determine why the bad call was made.

I am not trying to shift blame, but my pet peeve is when firefighting 
production and no one learns because then it just happens again.  Its very 
frustrating, even more so if I actually work there! ;-)

Cheers!
Ben

> Race condition with ConcurrentLinkedHashMap
> -------------------------------------------
>
>                 Key: CASSANDRA-405
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-405
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Chris Goffinet
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 405.patch, stack.log.gz
>
>
> We are seeing a race condition with ConcurrentLinkedHashMap using 
> appendToTail. We could remove the ConcurrentLinkedHashMap for now until 
> that's resolved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to