[
https://issues.apache.org/jira/browse/CASSANDRA-21220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18066167#comment-18066167
]
Sam Lightfoot edited comment on CASSANDRA-21220 at 3/16/26 6:22 PM:
--------------------------------------------------------------------
I had a quick look at the setup here, and _jvm-dtest_ runs on medium nodes with
5GB memory limit, yet build.xml L1958 for _jvm-dtest_ has _Xmx8G._ I assume
<JDK21 tests are fine due to use of G1 GC, whereas Z GC typically has a higher
memory footprint which is likely causing the intermittent OOM.
We could try reducing Xmx to fit within the memory limit, or another option is
to run _jvm-dtest_ on large nodes, similar to _dtest._ It feels like regardless
of the outcome of this, Xmx should be less than the memory limit of the medium
nodes.
_Trace_
*Step 1* — {{build.xml}} (lines 1950-1959): jvm-dtest sets {{-Xmx8G}}
{code:xml}
<target name="test-jvm-dtest" ...>
<jvmarg value="-Xmx8G"/>
</target>
{code}
*Step 2* — {{Jenkinsfile}} (line 197): no {{size}} specified
{code:groovy}
'jvm-dtest': [splits: 16],
{code}
*Step 3* — {{Jenkinsfile}} (lines 216-217): defaults to {{'medium'}}
{code:groovy}
if (!it.value['size']) {
it.value.put('size', 'medium')
}
{code}
*Step 4* — {{Jenkinsfile}} (line 504): node label becomes
{{cassandra-amd64-medium}}
{code:groovy}
def label = "cassandra-${cell.arch}-${command.size}"
{code}
*Step 5* — {{jenkins-deployment.yaml}} (lines 285-286): medium dind container
limit
{code:yaml}
resourceRequestMemory: 3400M
resourceLimitMemory: 5G
{code}
was (Author: JIRAUSER302824):
I had a quick look at the setup here, and _jvm-dtest_ runs on medium nodes with
5GB memory limit, yet build.xml L1958 for _jvm-dtest_ has _Xmx8G._ I assume
<JDK21 tests are fine due to use of G1 GC, whereas Z GC typically has a higher
memory footprint which is likely causing the intermittent OOM.
We could try reducing Xmx to fit within the memory limit, or another option is
to run _jvm-dtest_ on large nodes, similar to _dtest._ It feels like regardless
of the outcome of this, Xmx should be less than the memory limit of the medium
nodes.
> Test failure:
> org.apache.cassandra.distributed.test.SinglePartitionReadCommandTest
> ----------------------------------------------------------------------------------
>
> Key: CASSANDRA-21220
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21220
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Test/dtest/java
> Reporter: Sam Tunnicliffe
> Priority: Normal
>
> See:
> https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/org.apache.cassandra.distributed.test/SinglePartitionReadCommandTest/testNonCompactTableWithOnlyUpdatedColumnOnOneNodeAndColumnDeletionOnTheOther
> This seems to be failing due to OOM on a semi-regular basis, on jdk21 only.
> Typical stacktrace:
> {code}
> org.apache.cassandra.distributed.shared.ShutdownException: Uncaught
> exceptions were thrown during test
> at
> org.apache.cassandra.distributed.impl.AbstractCluster.checkAndResetUncaughtExceptions(AbstractCluster.java:1218)
> at
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1203)
> at
> org.apache.cassandra.distributed.test.SinglePartitionReadCommandTest.testNonCompactTableWithOnlyUpdatedColumnOnOneNodeAndColumnDeletionOnTheOther(SinglePartitionReadCommandTest.java:56)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:75)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:52)
> Suppressed: java.lang.OutOfMemoryError: Java heap space
> at
> java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:71)
> at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:391)
> at
> org.apache.cassandra.utils.memory.SlabAllocator.getRegion(SlabAllocator.java:138)
> at
> org.apache.cassandra.utils.memory.SlabAllocator.allocate(SlabAllocator.java:103)
> at
> org.apache.cassandra.utils.memory.MemtableBufferAllocator$MemtableByteBufferCloner.allocate(MemtableBufferAllocator.java:61)
> at
> org.apache.cassandra.utils.memory.ByteBufferCloner.clone(ByteBufferCloner.java:100)
> at
> org.apache.cassandra.utils.memory.ByteBufferCloner.clone(ByteBufferCloner.java:86)
> at
> org.apache.cassandra.utils.memory.ByteBufferCloner.clone(ByteBufferCloner.java:47)
> at
> org.apache.cassandra.db.memtable.SkipListMemtable.put(SkipListMemtable.java:125)
> at
> org.apache.cassandra.db.memtable.Memtable.put(Memtable.java:187)
> at
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1511)
> at
> org.apache.cassandra.db.CassandraTableWriteHandler.write(CassandraTableWriteHandler.java:38)
> at
> org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:579)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:434)
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:297)
> at org.apache.cassandra.db.Mutation.apply(Mutation.java:317)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeInternalWithoutCondition(ModificationStatement.java:846)
> at
> org.apache.cassandra.cql3.statements.ModificationStatement.executeLocally(ModificationStatement.java:837)
> at
> org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:519)
> at
> org.apache.cassandra.db.SystemKeyspace.updateCompactionHistory(SystemKeyspace.java:729)
> at
> org.apache.cassandra.db.compaction.CompactionTask.updateCompactionHistory(CompactionTask.java:409)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:335)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:26)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:99)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:110)
> at
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:408)
> at
> org.apache.cassandra.concurrent.FutureTask$3.call(FutureTask.java:141)
> at
> org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61)
> at
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> Suppressed: [CIRCULAR REFERENCE: java.lang.OutOfMemoryError: Java heap
> space]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]