[
https://issues.apache.org/jira/browse/CASSANDRA-21182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18060107#comment-18060107
]
Sam Lightfoot edited comment on CASSANDRA-21182 at 2/22/26 8:58 AM:
--------------------------------------------------------------------
Issue appears isolated to JDK21 as noted by [~dnk]. Further investigation
brought to light the difference in phantom reference processing in ZGC (s G1)
which only happens after major collections, which are not guaranteed to occur
during the stress runs (hence 53% pass rate).
If we force a major GC prior to measuring memory before the two stress runs,
the deltas are an order of magnitude smaller:
Before fix (no GC before measurement)
|Run|1st stress (bytes)|2nd stress (bytes)|Delta|Result|
|----|-------------------:|-------------------:|------:|--------|
|1|408,645,570|210,461,651|-198,183,919|PASS|
|2|342,576,727|210,448,640|-132,128,087|PASS|
|3|210,442,895|210,453,783|+10,888|PASS|
|4|276,510,062|210,453,941|-66,056,121|PASS|
|5|474,713,129|210,460,019|-264,253,110|PASS|
|6|210,445,154|210,447,960|+2,806|PASS|
|7|210,451,442|474,717,606|+264,266,164|FAIL|
|8|276,504,812|210,445,837|-66,058,975|PASS|
|9|210,434,999|276,508,460|+66,073,461|FAIL|
|10|474,696,363|210,445,867|-264,250,496|PASS|
|11|276,509,359|210,457,173|-66,052,186|PASS|
|12|210,442,918|210,448,765|+5,847|PASS|
|13|408,677,558|210,456,306|-198,221,252|PASS|
|14|408,637,477|210,446,769|-198,190,708|PASS|
|15|474,704,815|342,578,822|-132,125,993|PASS|
|16|474,708,196|210,453,162|-264,255,034|PASS|
After fix (System.GC before measurement)
|Run|1st stress (bytes)|2nd stress (bytes)|Delta|Result|
|----|-------------------:|-------------------:|------:|--------|
|1|210,447,275|210,455,775|+8,500|PASS|
|2|210,436,766|210,447,171|+10,405|PASS|
|3|210,440,717|210,445,675|+4,958|PASS|
|4|210,438,880|210,449,627|+10,747|PASS|
|5|210,443,597|210,449,431|+5,834|PASS|
|6|210,440,060|210,446,321|+6,261|PASS|
|7|210,445,318|210,446,761|+1,443|PASS|
|8|210,438,117|210,448,510|+10,393|PASS|
|9|210,442,032|210,447,989|+5,957|PASS|
|10|210,438,965|210,450,540|+11,575|PASS|
was (Author: JIRAUSER302824):
Issue appears isolated to JDK21 as noted by [~dnk]. Further investigation
brought to light the difference in phantom reference processing in ZGC (s G1)
which only happens after major collections, which are not guaranteed to occur
during the stress runs (hence 53% pass rate).
If we force a major GC prior to measuring memory before the two stress runs,
the deltas are an order of magnitude smaller:
Before fix (no GC before measurement)
|Run|1st stress (bytes)|2nd stress (bytes)|Delta|Result|
|----|-------------------:|-------------------:|------:|--------|
|1|408,645,570|210,461,651|-198,183,919|PASS|
|2|342,576,727|210,448,640|-132,128,087|PASS|
|3|210,442,895|210,453,783|+10,888|PASS|
|4|276,510,062|210,453,941|-66,056,121|PASS|
|5|474,713,129|210,460,019|-264,253,110|PASS|
|6|210,445,154|210,447,960|+2,806|PASS|
|7|210,451,442|474,717,606|*{*}+264,266,164{*}*|*{*}FAIL{*}*|
|8|276,504,812|210,445,837|-66,058,975|PASS|
|9|210,434,999|276,508,460|*{*}+66,073,461{*}*|*{*}FAIL{*}*|
|10|474,696,363|210,445,867|-264,250,496|PASS|
|11|276,509,359|210,457,173|-66,052,186|PASS|
|12|210,442,918|210,448,765|+5,847|PASS|
|13|408,677,558|210,456,306|-198,221,252|PASS|
|14|408,637,477|210,446,769|-198,190,708|PASS|
|15|474,704,815|342,578,822|-132,125,993|PASS|
|16|474,708,196|210,453,162|-264,255,034|PASS|
After fix (System.GC before measurement)
|Run|1st stress (bytes)|2nd stress (bytes)|Delta|Result|
|----|-------------------:|-------------------:|------:|--------|
|1|210,447,275|210,455,775|+8,500|PASS|
|2|210,436,766|210,447,171|+10,405|PASS|
|3|210,440,717|210,445,675|+4,958|PASS|
|4|210,438,880|210,449,627|+10,747|PASS|
|5|210,443,597|210,449,431|+5,834|PASS|
|6|210,440,060|210,446,321|+6,261|PASS|
|7|210,445,318|210,446,761|+1,443|PASS|
|8|210,438,117|210,448,510|+10,393|PASS|
|9|210,442,032|210,447,989|+5,957|PASS|
|10|210,438,965|210,450,540|+11,575|PASS|
> Fix flaky DTest: largecolumn_test.TestLargeColumn
> -------------------------------------------------
>
> Key: CASSANDRA-21182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21182
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Local/Compaction
> Reporter: Sam Lightfoot
> Assignee: Sam Lightfoot
> Priority: Normal
> Fix For: 5.1
>
>
> Flakiness indicating more than 64MB of direct pool growth between stress tests
> {code:java}
> # Now run the full stack to warm up internal caches/pools
> LARGE_COLUMN_SIZE = 1024 * 1024 * 63
> self.stress_with_col_size(cluster, node1, LARGE_COLUMN_SIZE)
> after1stLargeStress = self.directbytes(node1)
> logger.info("After 1st large column stress, direct memory:
> {0}".format(after1stLargeStress))
> # Now run the full stack to see how much memory is allocated for the second
> "large" columns request
> self.stress_with_col_size(cluster, node1, LARGE_COLUMN_SIZE)
> after2ndLargeStress = self.directbytes(node1)
> logger.info("After 2nd large column stress, direct memory:
> {0}".format(after2ndLargeStress))
> # We may allocate direct memory proportional to size of a request
> # but we want to ensure that when we do subsequent calls the used direct
> memory is not growing
> diff = int(after2ndLargeStress) - int(after1stLargeStress)
> logger.info("Direct memory delta: {0}".format(diff))
> assert diff < LARGE_COLUMN_SIZE {code}
> [https://pre-ci.cassandra.apache.org/job/cassandra/410/testReport/junit/dtest.largecolumn_test/TestLargeColumn/Tests___dtest_jdk21_34_64___test_cleanup/history/]
> {code:java}
> assert 66073403 < 66060288 {code}
> This could be a real issue - this amount of growth is unexpected.
> Appears it may be isolated to JDK21:
> *
> [https://pre-ci.cassandra.apache.org/job/cassandra/412/testReport/junit/dtest.largecolumn_test/TestLargeColumn/Tests___dtest_jdk21_34_64___test_cleanup/]
> *
> [https://pre-ci.cassandra.apache.org/job/cassandra/410/testReport/junit/dtest.largecolumn_test/TestLargeColumn/Tests___dtest_jdk21_34_64___test_cleanup/]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]