[ https://issues.apache.org/jira/browse/CASSANDRA-16681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356635#comment-17356635 ]
Adam Holmberg commented on CASSANDRA-16681: ------------------------------------------- I think I've found a race. I know [~gianluca] also said he's working on a patch, so I'm just going to post my findings here so we can compare notes. What I think is happening: LocalPool.addChunk [evicts|https://lists.apache.org/thread.html/r75b09da8df530aa382605887a63dfa57b9c8d647b10f9064dd2b027a%40%3Cdev.cassandra.apache.org%3E] a non-empty chunk in thread A. {{evict.release()}} finds a "not free" chunk and does not recycle. Meanwhile, thread B does {{LocalPooll.put}}, freeing the last buffer from the chunk. {{free}} returns -1, but the [status CaS|https://github.com/apache/cassandra/blob/4acfd3bdf1acdb6b28059a49dd39823d7ea0689d/src/java/org/apache/cassandra/utils/memory/BufferPool.java#L808] fails because thread A has not yet {{setEvicted}}. We therefore leave the function with the chunk totally freed, but not recycled. I have a patch with some synchronization around the release+status update that removes this flakiness. Currently wondering how big an issue it actually is, and if we care. Also pondering if this should be moved out of 4.0 for a number of reasons: 1.) 4.0 is close and I don't have much appetite for touching such integral code 2.) I think this issue is low-impact since the chunk would just be GC'd instead of being recycled 3.) The test is fairly pathological and this is perhaps even less likely to happen in the running server (pure speculation) Curious to get input on this and hear what Gianluca has found. > org.apache.cassandra.utils.memory.LongBufferPoolTest - tests are flaky > ---------------------------------------------------------------------- > > Key: CASSANDRA-16681 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16681 > Project: Cassandra > Issue Type: Bug > Components: CI > Reporter: Ekaterina Dimitrova > Assignee: Adam Holmberg > Priority: Normal > Fix For: 4.0, 4.0-rc > > Time Spent: 20m > Remaining Estimate: 0h > > Jenkins history: > [https://jenkins-cm4.apache.org/job/Cassandra-4.0/50/testReport/junit/org.apache.cassandra.utils.memory/LongBufferPoolTest/testPoolAllocateWithRecyclePartially/history/] > Fails being run in a loop in CircleCI: > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/844/workflows/945011f4-00ac-4678-89f6-5c0db0a40169/jobs/5008 > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org