[
https://issues.apache.org/jira/browse/CASSANDRA-17552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Kolaczkowski updated CASSANDRA-17552:
-------------------------------------------
Description:
{{LongBufferPoolTest}} fails pretty consistently on my local laptop.
I identified 3 different failure modes:
{noformat}
ERROR [test:1] 2022-04-13 16:29:03,064 LongBufferPoolTest.java:588 - Got
throwable null, current chunk [slab java.nio.DirectByteBuffer[pos=0 lim=131072
cap=131072], slots bitmap
1111111111111111111111111111111111111111111111111111111111111111, capacity
131072, free 131072]
java.lang.AssertionError
at
org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:1315)
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.get(BufferPool.java:576)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:900)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.lambda$new$0(BufferPool.java:739)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:952)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
at org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:513)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
{noformat}
ERROR [main] 2022-04-13 16:30:27,139 LongBufferPoolTest.java:614 - Test failed
- null
java.lang.AssertionError: null
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$Debug.check(LongBufferPoolTest.java:106)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:288)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest.main(LongBufferPoolTest.java:607)
{noformat}
{noformat}
ERROR [test:1] 2022-04-13 16:36:54,093 LongBufferPoolTest.java:580 - Got
exception null, current chunk null
java.lang.NullPointerException
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.add(BufferPool.java:513)
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$2200(BufferPool.java:480)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunk(BufferPool.java:963)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:956)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
at
org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:512)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
Branch: cassadra 4.0, commit d1270c204f31578212bfca5860ab46abeaec22b9
So far I've found the following problems with the code (this list might not be
complete):
Problem 1:
{{LocalPool}} documentation states that allocations from the local pool can be
done by a single thread only, but releases can be done by any thread. This
means {{LocalPool}} is shared between threads and should be thread safe.
Unfortunately the implementation is far from thread safe, because {{LocalPool}}
has mutable and unsynchronized state in {{MicroQueueOfChunks}}.
Possible problem 2:
There seems to be an assumption that the {{Chunk}} may be released only when no
more allocations are going on from it. However, I believe this assumption does
not hold and I can't see code enforcing that assumption. Because {{release}}
can be called by a different thread than the owner, it may clear the owner and
immediately clear the {{freeSlots}} bitmap in line 1150, despite the fact that
a concurrent allocation is still in progress. Clearing the flags in the wrong
moment would cause the assertion in line 1315 to fail.
was:
LongBufferPoolTest fails pretty consistently on my local laptop.
I identified 3 different failure modes:
{noformat}
ERROR [test:1] 2022-04-13 16:29:03,064 LongBufferPoolTest.java:588 - Got
throwable null, current chunk [slab java.nio.DirectByteBuffer[pos=0 lim=131072
cap=131072], slots bitmap
1111111111111111111111111111111111111111111111111111111111111111, capacity
131072, free 131072]
java.lang.AssertionError
at
org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:1315)
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.get(BufferPool.java:576)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:900)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.lambda$new$0(BufferPool.java:739)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:952)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
at org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:513)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
{noformat}
ERROR [main] 2022-04-13 16:30:27,139 LongBufferPoolTest.java:614 - Test failed
- null
java.lang.AssertionError: null
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$Debug.check(LongBufferPoolTest.java:106)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:288)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest.main(LongBufferPoolTest.java:607)
{noformat}
{noformat}
ERROR [test:1] 2022-04-13 16:36:54,093 LongBufferPoolTest.java:580 - Got
exception null, current chunk null
java.lang.NullPointerException
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.add(BufferPool.java:513)
at
org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$2200(BufferPool.java:480)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunk(BufferPool.java:963)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:956)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
at
org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
at
org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:512)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
at
org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
{noformat}
Branch: cassadra 4.0, commit d1270c204f31578212bfca5860ab46abeaec22b9
> LongBufferPoolTest failing, several data races in BufferPool
> ------------------------------------------------------------
>
> Key: CASSANDRA-17552
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17552
> Project: Cassandra
> Issue Type: Bug
> Reporter: Piotr Kolaczkowski
> Priority: Normal
>
> {{LongBufferPoolTest}} fails pretty consistently on my local laptop.
> I identified 3 different failure modes:
>
> {noformat}
> ERROR [test:1] 2022-04-13 16:29:03,064 LongBufferPoolTest.java:588 - Got
> throwable null, current chunk [slab java.nio.DirectByteBuffer[pos=0
> lim=131072 cap=131072], slots bitmap
> 1111111111111111111111111111111111111111111111111111111111111111, capacity
> 131072, free 131072]
> java.lang.AssertionError
> at
> org.apache.cassandra.utils.memory.BufferPool$Chunk.get(BufferPool.java:1315)
> at
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.get(BufferPool.java:576)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:900)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.lambda$new$0(BufferPool.java:739)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:952)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
> at
> org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:513)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>
> {noformat}
> ERROR [main] 2022-04-13 16:30:27,139 LongBufferPoolTest.java:614 - Test
> failed - null
> java.lang.AssertionError: null
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$Debug.check(LongBufferPoolTest.java:106)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest.testAllocate(LongBufferPoolTest.java:288)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest.main(LongBufferPoolTest.java:607)
> {noformat}
> {noformat}
> ERROR [test:1] 2022-04-13 16:36:54,093 LongBufferPoolTest.java:580 - Got
> exception null, current chunk null
> java.lang.NullPointerException
> at
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.add(BufferPool.java:513)
> at
> org.apache.cassandra.utils.memory.BufferPool$MicroQueueOfChunks.access$2200(BufferPool.java:480)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunk(BufferPool.java:963)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.addChunkFromParent(BufferPool.java:956)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGetInternal(BufferPool.java:907)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.tryGet(BufferPool.java:893)
> at
> org.apache.cassandra.utils.memory.BufferPool$LocalPool.access$000(BufferPool.java:710)
> at
> org.apache.cassandra.utils.memory.BufferPool.tryGet(BufferPool.java:205)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$2.testOne(LongBufferPoolTest.java:512)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:575)
> at
> org.apache.cassandra.utils.memory.LongBufferPoolTest$TestUntil.call(LongBufferPoolTest.java:553)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> Branch: cassadra 4.0, commit d1270c204f31578212bfca5860ab46abeaec22b9
> So far I've found the following problems with the code (this list might not
> be complete):
> Problem 1:
> {{LocalPool}} documentation states that allocations from the local pool can
> be done by a single thread only, but releases can be done by any thread. This
> means {{LocalPool}} is shared between threads and should be thread safe.
> Unfortunately the implementation is far from thread safe, because
> {{LocalPool}} has mutable and unsynchronized state in {{MicroQueueOfChunks}}.
> Possible problem 2:
> There seems to be an assumption that the {{Chunk}} may be released only when
> no more allocations are going on from it. However, I believe this assumption
> does not hold and I can't see code enforcing that assumption. Because
> {{release}} can be called by a different thread than the owner, it may clear
> the owner and immediately clear the {{freeSlots}} bitmap in line 1150,
> despite the fact that a concurrent allocation is still in progress. Clearing
> the flags in the wrong moment would cause the assertion in line 1315 to fail.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]