[
https://issues.apache.org/jira/browse/CASSANDRA-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880116#comment-17880116
]
Dmitry Konstantinov edited comment on CASSANDRA-17298 at 9/8/24 11:55 AM:
--------------------------------------------------------------------------
As of now I cannot reproduce it locally (MacOS), I have executed a similar
logic (without a container) 200 times using AdoptOpenJDK 1.8.0_265 and 200
times using AdoptOpenJDK-11.0.11 but all of the runs were successful:
{code:java}
count=200
for i in $(seq -w 1 $count)
do
echo "Running test
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest, iteration $i
of $count"
ant testsome
-Dtest.name=org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest
-Dno-build-test=true
mkdir build/repeated_tests/$i
mv
build/test/output/TEST-org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest.xml
build/repeated_tests/$i
done
{code}
Observations based on the provided logs: there is a fluctuation in the memory
measured within the test using MemoryMeter.measureDeep(memtable). In successful
scenarios on CI (as well as locally in my case) it is ~79MiB. In case of a
failure it is has a higher value like ~86MiB:
{code:java}
[junit-timeout] INFO [main] 2024-09-07 12:03:39,024
MemtableSizeTestBase.java:176 - Expected heap usage close to 78.755MiB, got
78.754MiB, 0.797KiB difference. Delta per partition: 0.01 bytes
{code}
{code:java}
[junit-timeout] INFO [main] 2024-09-07 12:03:58,623
MemtableSizeTestBase.java:176 - Expected heap usage close to 85.836MiB, got
78.878MiB, 6.957MiB difference. Delta per partition: 112.24 bytes
{code}
The value reported by Allocator logic (it is what we actually test) is
consistent across all runs.
Taking in account that the issue happens only for offheap_buffers mode I
suspect that the possible cause can be unexpected shared objects which
MemoryMeter traversing from DirectByteBuffer objects through "att" and
"cleaner" references.
An experiment which shows that it is possible:
{code:java}
@Test
public void test() {
for (int i = 0; i < 10; i++) {
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1);
}
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(10);
MemoryMeter meter = new
MemoryMeter().withGuessing(MemoryMeter.Guess.FALLBACK_UNSAFE)
.enableDebug(100)
//.omitSharedBufferOverhead()
.ignoreNonStrongReferences()
.ignoreKnownSingletons();
System.out.println(meter.measureDeep(byteBuffer));
} {code}
{code:java}
root [java.nio.DirectByteBuffer] 1.51 KB (64 bytes)
|
+--cleaner [sun.misc.Cleaner] 1.45 KB (40 bytes)
|
+--next [sun.misc.Cleaner] 1.33 KB (40 bytes)
| |
| +--next [sun.misc.Cleaner] 1.20 KB (40 bytes)
| | |
| | +--next [sun.misc.Cleaner] 1.06 KB (40 bytes)
| | | |
| | | +--next [sun.misc.Cleaner] 952 bytes (40 bytes)
| | | | |
| | | | +--next [sun.misc.Cleaner] 816 bytes (40 bytes)
| | | | | |
| | | | | +--next [sun.misc.Cleaner] 680 bytes (40 bytes)
| | | | | | |
| | | | | | +--next [sun.misc.Cleaner] 544 bytes (40 bytes)
| | | | | | | |
| | | | | | | +--next [sun.misc.Cleaner] 408 bytes (40 bytes)
| | | | | | | | |
| | | | | | | | +--next [sun.misc.Cleaner] 272 bytes (40 bytes)
| | | | | | | | | |
| | | | | | | | | +--next [sun.misc.Cleaner] 136 bytes (40 bytes)
| | | | | | | | | | |
| | | | | | | | | | +--thunk
[java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
| | | | | | | | | | |
| | | | | | | | | | +--referent [java.nio.DirectByteBuffer] 64
bytes (64 bytes)
| | | | | | | | | |
| | | | | | | | | +--thunk [java.nio.DirectByteBuffer$Deallocator]
32 bytes (32 bytes)
| | | | | | | | | |
| | | | | | | | | +--referent [java.nio.DirectByteBuffer] 64 bytes
(64 bytes)
| | | | | | | | |
| | | | | | | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32
bytes (32 bytes)
| | | | | | | | |
| | | | | | | | +--referent [java.nio.DirectByteBuffer] 64 bytes
(64 bytes)
| | | | | | | |
| | | | | | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32
bytes (32 bytes)
| | | | | | | |
| | | | | | | +--referent [java.nio.DirectByteBuffer] 64 bytes (64
bytes)
| | | | | | |
| | | | | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes
(32 bytes)
| | | | | | |
| | | | | | +--referent [java.nio.DirectByteBuffer] 64 bytes (64
bytes)
| | | | | |
| | | | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes
(32 bytes)
| | | | | |
| | | | | +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
| | | | |
| | | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32
bytes)
| | | | |
| | | | +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
| | | |
| | | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32
bytes)
| | | |
| | | +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
| | |
| | +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
| | |
| | +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
| |
| +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
| |
| +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
|
+--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
|
+--queue [java.lang.ref.ReferenceQueue] 48 bytes (32 bytes)
|
+--lock [java.lang.ref.ReferenceQueue$Lock] 16 bytes (16 bytes)1544 {code}
Ideally, it would be beneficial to switch to jamm 0.4.0 (used in 5.0) - it
covers such cases
([https://github.com/jbellis/jamm/blob/aa95057962b18a436ff12ddd9f5d0e9a3da74ae2/src/org/github/jamm/Filters.java#L70C37-L70C58])
but I suppose it is not allowed to do such 3rd party upgrades within 4.0/4.1
branches.
In case of the current MemoryMeter:
* if omitSharedBufferOverhead is disabled it starts to traverse and include
heap usage for unpredictable global Cleaner/ReferenceQueue graphs
* if omitSharedBufferOverhead is enabled it includes direct buffer capacity
into the memory usage
So, I have not find a better way than using of omitSharedBufferOverhead mode
(which does not traverse via "att" and "cleaner" references of DirectByteBuffer
objects but includes the buffer capacity into the measured result) and then to
correct the heap usage measured in the test by subtracting the total size of
data within DirectBuffer (off-heap allocation).
I have submitted the change in the test logic to
[https://github.com/apache/cassandra/pull/3503/|https://github.com/apache/cassandra/pull/3503/commits]
[~smiklosovic] would it be possible to run the repeated tests again for
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest for the updated
4.0 MR?
was (Author: dnk):
As of now I cannot reproduce it locally (MacOS), I have executed a similar
logic (without a container) 200 times using AdoptOpenJDK 1.8.0_265 and 200
times using AdoptOpenJDK-11.0.11 but all of the runs were successful:
{code:java}
count=200
for i in $(seq -w 1 $count)
do
echo "Running test
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest, iteration $i
of $count"
ant testsome
-Dtest.name=org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest
-Dno-build-test=true
mkdir build/repeated_tests/$i
mv
build/test/output/TEST-org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest.xml
build/repeated_tests/$i
done
{code}
Observations based on the provided logs: there is a fluctuation in the memory
measured within the test using MemoryMeter.measureDeep(memtable). In successful
scenarios on CI (as well as locally in my case) it is ~79MiB. In case of a
failure it is has a higher value like ~86MiB:
{code:java}
[junit-timeout] INFO [main] 2024-09-07 12:03:39,024
MemtableSizeTestBase.java:176 - Expected heap usage close to 78.755MiB, got
78.754MiB, 0.797KiB difference. Delta per partition: 0.01 bytes
{code}
{code:java}
[junit-timeout] INFO [main] 2024-09-07 12:03:58,623
MemtableSizeTestBase.java:176 - Expected heap usage close to 85.836MiB, got
78.878MiB, 6.957MiB difference. Delta per partition: 112.24 bytes
{code}
The value reported by Allocator logic (it is what we actually test) is
consistent across all runs.
Taking in account that the issue happens only for offheap_buffers mode I
suspect that the possible cause can be unexpected shared objects which
MemoryMeter traversing from DirectByteBuffer objects through "att" and
"cleaner" references.
Ideally, it would be beneficial to switch to jamm 0.4.0 (used in 5.0) - it
covers such cases
([https://github.com/jbellis/jamm/blob/aa95057962b18a436ff12ddd9f5d0e9a3da74ae2/src/org/github/jamm/Filters.java#L70C37-L70C58])
but I suppose it is not allowed to do such 3rd party upgrades within 4.0/4.1
branches.
In case of the current MemoryMeter:
* if omitSharedBufferOverhead is disabled it starts to traverse and include
heap usage for unpredictable global Cleaner/ReferenceQueue graphs
* if omitSharedBufferOverhead is enabled it includes direct buffer capacity
into the memory usage
So, I have not find a better way than using of omitSharedBufferOverhead mode
(which does not traverse via "att" and "cleaner" references of DirectByteBuffer
objects but includes the buffer capacity into the measured result) and then to
correct the heap usage measured in the test by subtracting the total size of
data within DirectBuffer (off-heap allocation).
I have submitted the change in the test logic to
[https://github.com/apache/cassandra/pull/3503/|https://github.com/apache/cassandra/pull/3503/commits]
[~smiklosovic] would it be possible to run the repeated tests again for
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest for the updated
4.0 MR?
> Test Failure: org.apache.cassandra.cql3.MemtableSizeTest
> --------------------------------------------------------
>
> Key: CASSANDRA-17298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17298
> Project: Cassandra
> Issue Type: Bug
> Components: Test/unit
> Reporter: Josh McKenzie
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: analyzed_objects.svg, structure_example.svg
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> [https://ci-cassandra.apache.org/job/Cassandra-4.0/313/testReport/org.apache.cassandra.cql3/MemtableSizeTest/testTruncationReleasesLogSpace_2/]
> Failed 4 times in the last 30 runs. Flakiness: 27%, Stability: 86%
> Error Message
> Expected heap usage close to 49.930MiB, got 41.542MiB.
> {code}
> Stacktrace
> junit.framework.AssertionFailedError: Expected heap usage close to 49.930MiB,
> got 41.542MiB.
> at
> org.apache.cassandra.cql3.MemtableSizeTest.testSize(MemtableSizeTest.java:130)
> at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:644)
> at org.apache.cassandra.Util.flakyTest(Util.java:669)
> at
> org.apache.cassandra.cql3.MemtableSizeTest.testTruncationReleasesLogSpace(MemtableSizeTest.java:61)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> *UPDATE:* It was discovered that unit tests were running with
> memtable_allocation_type: offheap_objects when we ship C* with heap_buffers.
> So we changed that in CASSANDRA-19326, now we test with
> memtable_allocation_type: heap_buffers. As a result, this test now fails all
> the time on 4.0 and 4.1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]