[ 
https://issues.apache.org/jira/browse/CASSANDRA-16532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17307422#comment-17307422
 ] 

Adam Holmberg commented on CASSANDRA-16532:
-------------------------------------------

I think I see what's happening. Reference counting for ChunkCache buffer is 
[allowed to go below 
zero|https://github.com/apache/cassandra/blob/bf96367f4d55692017e144980cf17963e31df127/src/java/org/apache/cassandra/cache/ChunkCache.java#L135].
 Then, it is possible to [find a non-zero refCount, return a non-null reference 
incrementing from -1 --> 0, and arrive at {{buffer}} finding references is now 
zero|https://github.com/apache/cassandra/blob/bf96367f4d55692017e144980cf17963e31df127/src/java/org/apache/cassandra/cache/ChunkCache.java#L111-L122].


We're getting in this state while racing with an async task which is currently 
closing the file:

The file is being closed as part of the tidy task:
{noformat}
[junit-timeout]         at 
org.apache.cassandra.cache.ChunkCache$Buffer.release(ChunkCache.java:158)
[junit-timeout]         at 
org.apache.cassandra.cache.ChunkCache.onRemoval(ChunkCache.java:187)
[junit-timeout]         at 
org.apache.cassandra.cache.ChunkCache.onRemoval(ChunkCache.java:41)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$notifyRemoval$1(BoundedLocalCache.java:286)
[junit-timeout]         at 
com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.notifyRemoval(BoundedLocalCache.java:292)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.removeNoWriter(BoundedLocalCache.java:1731)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.BoundedLocalCache.remove(BoundedLocalCache.java:1695)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.LocalCache.invalidateAll(LocalCache.java:126)
[junit-timeout]         at 
com.github.benmanes.caffeine.cache.LocalManualCache.invalidateAll(LocalManualCache.java:79)
[junit-timeout]         at 
org.apache.cassandra.cache.ChunkCache.invalidateFile(ChunkCache.java:218)
[junit-timeout]         at 
org.apache.cassandra.io.util.FileHandle$Cleanup.lambda$tidy$0(FileHandle.java:208)
[junit-timeout]         at java.util.Optional.ifPresent(Optional.java:159)
[junit-timeout]         at 
org.apache.cassandra.io.util.FileHandle$Cleanup.tidy(FileHandle.java:208)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref$GlobalState.release(Ref.java:325)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref$State.ensureReleased(Ref.java:203)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref.ensureReleased(Ref.java:128)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.SharedCloseableImpl.close(SharedCloseableImpl.java:45)
[junit-timeout]         at 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier$1.run(SSTableReader.java:2058)
[junit-timeout]         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
{noformat}

Which was scheduled by the previous scrub test:
{noformat}
[junit-timeout]         at 
org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier.tidy(SSTableReader.java:2020)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref$GlobalState.release(Ref.java:325)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref$State.release(Ref.java:224)
[junit-timeout]         at 
org.apache.cassandra.utils.concurrent.Ref.release(Ref.java:118)
[junit-timeout]         at 
org.apache.cassandra.db.compaction.Scrubber.lambda$scrub$0(Scrubber.java:303)
[junit-timeout]         at java.util.ArrayList.forEach(ArrayList.java:1257)
[junit-timeout]         at 
org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:303)
[junit-timeout]         at 
org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:226)
[junit-timeout]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
[junit-timeout]         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[junit-timeout]         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit-timeout]         at java.lang.reflect.Method.invoke(Method.java:498)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner.runClassAsTool(ToolRunner.java:82)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:249)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner$2.get(ToolRunner.java:245)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner.invokeSupplier(ToolRunner.java:305)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:253)
[junit-timeout]         at 
org.apache.cassandra.tools.ToolRunner.invokeClass(ToolRunner.java:235)
[junit-timeout]         at 
org.apache.cassandra.db.ScrubTest.testHeaderFixWithTool(ScrubTest.java:874)
{noformat}

I had hoped it would be sufficient to disallow negative numbers for the ref 
count, but at first blush that is revealing other issues. The work goes on.

> Fix flaky testSkipScrubCorruptedCounterRowWithTool
> --------------------------------------------------
>
>                 Key: CASSANDRA-16532
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16532
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.0-rc
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix flaky 
> [testSkipScrubCorruptedCounterRowWithTool|https://ci-cassandra.apache.org/job/Cassandra-trunk/365/testReport/junit/org.apache.cassandra.db/ScrubTest/testSkipScrubCorruptedCounterRowWithTool_compression/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to