[ 
https://issues.apache.org/jira/browse/CASSANDRA-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880116#comment-17880116
 ] 

Dmitry Konstantinov edited comment on CASSANDRA-17298 at 9/8/24 11:55 AM:
--------------------------------------------------------------------------

As of now I cannot reproduce it locally (MacOS), I have executed a similar 
logic (without a container) 200 times using AdoptOpenJDK 1.8.0_265 and 200 
times using AdoptOpenJDK-11.0.11 but all of the runs were successful:
{code:java}
count=200
for i in $(seq -w 1 $count)
do
  echo "Running test 
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest, iteration $i 
of $count"
  ant testsome 
-Dtest.name=org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest 
-Dno-build-test=true
  mkdir build/repeated_tests/$i
  mv 
build/test/output/TEST-org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest.xml
 build/repeated_tests/$i
done
{code}
Observations based on the provided logs: there is a fluctuation in the memory 
measured within the test using MemoryMeter.measureDeep(memtable). In successful 
scenarios on CI (as well as locally in my case) it is ~79MiB. In case of a 
failure it is has a higher value like ~86MiB:
{code:java}
[junit-timeout] INFO  [main] 2024-09-07 12:03:39,024 
MemtableSizeTestBase.java:176 - Expected heap usage close to 78.755MiB, got 
78.754MiB, 0.797KiB difference. Delta per partition: 0.01 bytes
{code}
{code:java}
[junit-timeout] INFO  [main] 2024-09-07 12:03:58,623 
MemtableSizeTestBase.java:176 - Expected heap usage close to 85.836MiB, got 
78.878MiB, 6.957MiB difference. Delta per partition: 112.24 bytes
{code}
The value reported by Allocator logic (it is what we actually test) is 
consistent across all runs.

Taking in account that the issue happens only for offheap_buffers mode I 
suspect that the possible cause can be unexpected shared objects which 
MemoryMeter traversing from DirectByteBuffer objects through "att" and 
"cleaner" references.

An experiment which shows that it is possible:
{code:java}
@Test
public void test() {

    for (int i = 0; i < 10; i++) {
        ByteBuffer byteBuffer = ByteBuffer.allocateDirect(1);
    }

    ByteBuffer byteBuffer = ByteBuffer.allocateDirect(10);

    MemoryMeter meter = new 
MemoryMeter().withGuessing(MemoryMeter.Guess.FALLBACK_UNSAFE)
                                         .enableDebug(100)
                                         //.omitSharedBufferOverhead()
                                         .ignoreNonStrongReferences()
                                         .ignoreKnownSingletons();

    System.out.println(meter.measureDeep(byteBuffer));

} {code}
{code:java}
root [java.nio.DirectByteBuffer] 1.51 KB (64 bytes)
  |
  +--cleaner [sun.misc.Cleaner] 1.45 KB (40 bytes)
    |
    +--next [sun.misc.Cleaner] 1.33 KB (40 bytes)
    |  |
    |  +--next [sun.misc.Cleaner] 1.20 KB (40 bytes)
    |  |  |
    |  |  +--next [sun.misc.Cleaner] 1.06 KB (40 bytes)
    |  |  |  |
    |  |  |  +--next [sun.misc.Cleaner] 952 bytes (40 bytes)
    |  |  |  |  |
    |  |  |  |  +--next [sun.misc.Cleaner] 816 bytes (40 bytes)
    |  |  |  |  |  |
    |  |  |  |  |  +--next [sun.misc.Cleaner] 680 bytes (40 bytes)
    |  |  |  |  |  |  |
    |  |  |  |  |  |  +--next [sun.misc.Cleaner] 544 bytes (40 bytes)
    |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  +--next [sun.misc.Cleaner] 408 bytes (40 bytes)
    |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  +--next [sun.misc.Cleaner] 272 bytes (40 bytes)
    |  |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  |  +--next [sun.misc.Cleaner] 136 bytes (40 bytes)
    |  |  |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  |  |  +--thunk 
[java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
    |  |  |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 
bytes (64 bytes)
    |  |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 
32 bytes (32 bytes)
    |  |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes 
(64 bytes)
    |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 
bytes (32 bytes)
    |  |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes 
(64 bytes)
    |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 
bytes (32 bytes)
    |  |  |  |  |  |  |  |
    |  |  |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 
bytes)
    |  |  |  |  |  |  |
    |  |  |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes 
(32 bytes)
    |  |  |  |  |  |  |
    |  |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 
bytes)
    |  |  |  |  |  |
    |  |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes 
(32 bytes)
    |  |  |  |  |  |
    |  |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
    |  |  |  |  |
    |  |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 
bytes)
    |  |  |  |  |
    |  |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
    |  |  |  |
    |  |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 
bytes)
    |  |  |  |
    |  |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
    |  |  |
    |  |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
    |  |  |
    |  |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
    |  |
    |  +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
    |  |
    |  +--referent [java.nio.DirectByteBuffer] 64 bytes (64 bytes)
    |
    +--thunk [java.nio.DirectByteBuffer$Deallocator] 32 bytes (32 bytes)
    |
    +--queue [java.lang.ref.ReferenceQueue] 48 bytes (32 bytes)
      |
      +--lock [java.lang.ref.ReferenceQueue$Lock] 16 bytes (16 bytes)1544 {code}
Ideally, it would be beneficial to switch to jamm 0.4.0 (used in 5.0) - it 
covers such cases 
([https://github.com/jbellis/jamm/blob/aa95057962b18a436ff12ddd9f5d0e9a3da74ae2/src/org/github/jamm/Filters.java#L70C37-L70C58])
 but I suppose it is not allowed to do such 3rd party upgrades within 4.0/4.1 
branches.
In case of the current MemoryMeter:
 * if omitSharedBufferOverhead is disabled it starts to traverse and include 
heap usage for unpredictable global Cleaner/ReferenceQueue graphs
 * if omitSharedBufferOverhead is enabled it includes direct buffer capacity 
into the memory usage

So, I have not find a better way than using of omitSharedBufferOverhead mode 
(which does not traverse via "att" and "cleaner" references of DirectByteBuffer 
objects but includes the buffer capacity into the measured result) and then to 
correct the heap usage measured in the test by subtracting the total size of 
data within DirectBuffer (off-heap allocation).

I have submitted the change in the test logic to 
[https://github.com/apache/cassandra/pull/3503/|https://github.com/apache/cassandra/pull/3503/commits]

[~smiklosovic] would it be possible to run the repeated tests again for 
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest for the updated 
4.0 MR?

 


was (Author: dnk):
As of now I cannot reproduce it locally (MacOS), I have executed a similar 
logic (without a container) 200 times using AdoptOpenJDK 1.8.0_265 and 200 
times using AdoptOpenJDK-11.0.11 but all of the runs were successful:
{code:java}
count=200
for i in $(seq -w 1 $count)
do
  echo "Running test 
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest, iteration $i 
of $count"
  ant testsome 
-Dtest.name=org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest 
-Dno-build-test=true
  mkdir build/repeated_tests/$i
  mv 
build/test/output/TEST-org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest.xml
 build/repeated_tests/$i
done
{code}
Observations based on the provided logs: there is a fluctuation in the memory 
measured within the test using MemoryMeter.measureDeep(memtable). In successful 
scenarios on CI (as well as locally in my case) it is ~79MiB. In case of a 
failure it is has a higher value like ~86MiB:
{code:java}
[junit-timeout] INFO  [main] 2024-09-07 12:03:39,024 
MemtableSizeTestBase.java:176 - Expected heap usage close to 78.755MiB, got 
78.754MiB, 0.797KiB difference. Delta per partition: 0.01 bytes
{code}
{code:java}
[junit-timeout] INFO  [main] 2024-09-07 12:03:58,623 
MemtableSizeTestBase.java:176 - Expected heap usage close to 85.836MiB, got 
78.878MiB, 6.957MiB difference. Delta per partition: 112.24 bytes
{code}
The value reported by Allocator logic (it is what we actually test) is 
consistent across all runs.

Taking in account that the issue happens only for offheap_buffers mode I 
suspect that the possible cause can be unexpected shared objects which 
MemoryMeter traversing from DirectByteBuffer objects through "att" and 
"cleaner" references.

Ideally, it would be beneficial to switch to jamm 0.4.0 (used in 5.0) - it 
covers such cases 
([https://github.com/jbellis/jamm/blob/aa95057962b18a436ff12ddd9f5d0e9a3da74ae2/src/org/github/jamm/Filters.java#L70C37-L70C58])
 but I suppose it is not allowed to do such 3rd party upgrades within 4.0/4.1 
branches.
In case of the current MemoryMeter:
 * if omitSharedBufferOverhead is disabled it starts to traverse and include 
heap usage for unpredictable global Cleaner/ReferenceQueue graphs
 * if omitSharedBufferOverhead is enabled it includes direct buffer capacity 
into the memory usage

So, I have not find a better way than using of omitSharedBufferOverhead mode 
(which does not traverse via "att" and "cleaner" references of DirectByteBuffer 
objects but includes the buffer capacity into the measured result) and then to 
correct the heap usage measured in the test by subtracting the total size of 
data within DirectBuffer (off-heap allocation).

I have submitted the change in the test logic to 
[https://github.com/apache/cassandra/pull/3503/|https://github.com/apache/cassandra/pull/3503/commits]

[~smiklosovic] would it be possible to run the repeated tests again for 
org.apache.cassandra.db.memtable.MemtableSizeOffheapBuffersTest for the updated 
4.0 MR?

 

> Test Failure: org.apache.cassandra.cql3.MemtableSizeTest
> --------------------------------------------------------
>
>                 Key: CASSANDRA-17298
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17298
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Test/unit
>            Reporter: Josh McKenzie
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>         Attachments: analyzed_objects.svg, structure_example.svg
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://ci-cassandra.apache.org/job/Cassandra-4.0/313/testReport/org.apache.cassandra.cql3/MemtableSizeTest/testTruncationReleasesLogSpace_2/]
>  Failed 4 times in the last 30 runs. Flakiness: 27%, Stability: 86%
> Error Message
> Expected heap usage close to 49.930MiB, got 41.542MiB.
> {code}
> Stacktrace
> junit.framework.AssertionFailedError: Expected heap usage close to 49.930MiB, 
> got 41.542MiB.
>       at 
> org.apache.cassandra.cql3.MemtableSizeTest.testSize(MemtableSizeTest.java:130)
>       at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:644)
>       at org.apache.cassandra.Util.flakyTest(Util.java:669)
>       at 
> org.apache.cassandra.cql3.MemtableSizeTest.testTruncationReleasesLogSpace(MemtableSizeTest.java:61)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  {code}
> *UPDATE:* It was discovered that unit tests were running with 
> memtable_allocation_type: offheap_objects when we ship C* with heap_buffers.
> So we changed that in CASSANDRA-19326, now we test with 
> memtable_allocation_type: heap_buffers. As a result, this test now fails all 
> the time on 4.0 and 4.1. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to