[ 
https://issues.apache.org/jira/browse/CASSANDRA-16681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356635#comment-17356635
 ] 

Adam Holmberg commented on CASSANDRA-16681:
-------------------------------------------

I think I've found a race. I know [~gianluca] also said he's working on a 
patch, so I'm just going to post my findings here so we can compare notes.

What I think is happening: 

LocalPool.addChunk 
[evicts|https://lists.apache.org/thread.html/r75b09da8df530aa382605887a63dfa57b9c8d647b10f9064dd2b027a%40%3Cdev.cassandra.apache.org%3E]
 a non-empty chunk in thread A. {{evict.release()}} finds a "not free" chunk 
and does not recycle. Meanwhile, thread B does {{LocalPooll.put}}, freeing the 
last buffer from the chunk. {{free}} returns -1, but the [status 
CaS|https://github.com/apache/cassandra/blob/4acfd3bdf1acdb6b28059a49dd39823d7ea0689d/src/java/org/apache/cassandra/utils/memory/BufferPool.java#L808]
 fails because thread A has not yet {{setEvicted}}. We therefore leave the 
function with the chunk totally freed, but not recycled.

I have a patch with some synchronization around the release+status update that 
removes this flakiness. Currently wondering how big an issue it actually is, 
and if we care. Also pondering if this should be moved out of 4.0 for a number 
of reasons:

1.) 4.0 is close and I don't have much appetite for touching such integral code
2.) I think this issue is low-impact since the chunk would just be GC'd instead 
of being recycled
3.) The test is fairly pathological and this is perhaps even less likely to 
happen in the running server (pure speculation)

Curious to get input on this and hear what Gianluca has found.

> org.apache.cassandra.utils.memory.LongBufferPoolTest - tests are flaky
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-16681
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16681
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CI
>            Reporter: Ekaterina Dimitrova
>            Assignee: Adam Holmberg
>            Priority: Normal
>             Fix For: 4.0, 4.0-rc
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Jenkins history:
> [https://jenkins-cm4.apache.org/job/Cassandra-4.0/50/testReport/junit/org.apache.cassandra.utils.memory/LongBufferPoolTest/testPoolAllocateWithRecyclePartially/history/]
> Fails being run in a loop in CircleCI:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/844/workflows/945011f4-00ac-4678-89f6-5c0db0a40169/jobs/5008
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to