[
https://issues.apache.org/jira/browse/CASSANDRA-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038577#comment-18038577
]
Dmitry Konstantinov edited comment on CASSANDRA-20980 at 11/15/25 5:30 PM:
---------------------------------------------------------------------------
I've done few e2e tests for the changes by running Cassandra node built from
[https://github.com/apache/cassandra/pull/4454]
There are few non-committed debug logging changes to simplify analysis: logging
of listen MBeans + logging of all GC events):
[^debug_logging_gc_inspector.patch]
The behaviour looks as expected for me.
Used JDK:
{code:java}
openjdk version "17.0.15" 2025-04-15
OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6)
{code}
I've run a few load tests to cause GC.
Default GC inspector limits were used:
gc_log_threshold = 200 ms
gc_warn_threshold = 1s
gc_concurrent_phase_log_threshold = 1s
gc_concurrent_phase_warn_threshold = 2s
h3. G1
[^system_g1.log] [^gc_g1.log]
MBeans
{code:java}
java.lang:name=G1 Young Generation,type=GarbageCollector
java.lang:name=G1 Old Generation,type=GarbageCollector
{code}
Pauses detection still works:
{code:java}
INFO [Notification Thread] 2025-11-15T16:19:13,080 GCInspector.java:312 - G1
Young Generation GC in 233ms. G1 Eden Space: 6476005376 -> 0; G1 Old Gen:
22573666304 -> 23790014464; G1 Survivor Space: 1258291200 -> 1073741824;
{code}
Note: we do not have an official support of JDK 21 as of now, I so have not
tested G1 concurrent mode, A new JMX bean for it has been added in JDK20
([https://bugs.openjdk.org/browse/JDK-8297754]).
h3. Shenandoah
[^system_shenandoah.log] [^gc_shenandoah.log]
MBeans
{code:java}
java.lang:name=Shenandoah Pauses,type=GarbageCollector
java.lang:name=Shenandoah Cycles,type=GarbageCollector
{code}
We log Shenandoah Cycles concurrent GCs with time > 1 sec as INFO:
{code:java}
INFO [Notification Thread] 2025-11-15T16:31:21,372 GCInspector.java:300 -
Shenandoah Cycles GC in 1762ms. CodeHeap 'non-nmethods': 2047744 -> 3201536;
CodeHeap 'non-profiled nmethods': 18649088 -> 18653952; CodeHeap 'profiled
nmethods': 19999488 -> 20063744; Metaspace: 63209472 -> 63218224; Shenandoah:
17291619784 -> 6818925104
{code}
We log Shenandoah Cycles concurrent GCs with time > 2 sec as WARN:
{code:java}
WARN [Notification Thread] 2025-11-15T16:30:53,812 GCInspector.java:298 -
Shenandoah Cycles GC in 3813ms. CodeHeap 'non-nmethods': 3778432 -> 2047744;
CodeHeap 'non-profiled nmethods': 18428800 -> 18444032; CodeHeap 'profiled
nmethods': 19460992 -> 19521024; Metaspace: 63072296 -> 63089704; Shenandoah:
22980826480 -> 22186858112
{code}
We do not log Shenandoah Cycles concurrent GCs with time < 1 sec and > 300 ms
(this is the main goal of this improvement):
{code:java}
// it is present only in debug output but not reported by GCInspector real
logging
INFO [Notification Thread] 2025-11-15T16:31:50,315 GCInspector.java:265 -
debug, gcId: 51, gcName: Shenandoah Cycles, gcCause: Concurrent GC, gcAction:
end of GC cycle, event duration: 349, calculated duration: 348
{code}
h3. ZGC
[^system_zgc.log] [^gc_zgc.log]
MBeans
{code:java}
java.lang:name=ZGC Cycles,type=GarbageCollector
java.lang:name=ZGC Pauses,type=GarbageCollector
{code}
We log ZGC Cycles concurrent GCs with time > 1 sec as INFO:
{code:java}
INFO [Notification Thread] 2025-11-15T16:38:53,877 GCInspector.java:300 - ZGC
Cycles GC in 1694ms. CodeHeap 'non-nmethods': 3414912 -> 1684224; CodeHeap
'non-profiled nmethods': 13763712 -> 14028160; CodeHeap 'profiled nmethods':
19596416 -> 18350848; Compressed Class Space: 7549152 -> 7551488; Metaspace:
62210520 -> 62218096; ZHeap: 6757023744 -> 9124708352
{code}
We log ZGC Cycles concurrent GCs with time > 2 sec as WARN:
{code:java}
WARN [Notification Thread] 2025-11-15T16:38:56,746 GCInspector.java:298 - ZGC
Cycles GC in 2660ms. CodeHeap 'non-nmethods': 3414912 -> 1684224; CodeHeap
'non-profiled nmethods': 14354048 -> 14567424; CodeHeap 'profiled nmethods':
18724352 -> 18601856; Compressed Class Space: 7556032 -> 7556544; Metaspace:
62223720 -> 62262456; ZHeap: 10284433408 -> 17983078400
{code}
We do not log ZGC Cycles concurrent GCs with time < 1 sec and > 300 ms (this is
the main goal of this improvement):
{code:java}
// it is present only in debug output but not reported by GCInspector real
logging
INFO [Notification Thread] 2025-11-15T16:42:49,461 GCInspector.java:265 -
debug, gcId: 50, gcName: ZGC Cycles, gcCause: Allocation Rate, gcAction: end of
GC cycle, event duration: 379, calculated duration: 380
{code}
was (Author: dnk):
I've done few e2e tests for the changes (by running Cassandra node built from
[https://github.com/apache/cassandra/pull/4454] with few debug logging changes
to simplify analysis: logging of listen MBeans + logging of all GC events).
The behaviour looks as expected for me.
Used JDK:
{code:java}
openjdk version "17.0.15" 2025-04-15
OpenJDK Runtime Environment Temurin-17.0.15+6 (build 17.0.15+6)
{code}
I've run a few load tests to cause GC.
Default GC inspector limits were used:
gc_log_threshold = 200 ms
gc_warn_threshold = 1s
gc_concurrent_phase_log_threshold = 1s
gc_concurrent_phase_warn_threshold = 2s
h3. G1
[^system_g1.log] [^gc_g1.log]
MBeans
{code:java}
java.lang:name=G1 Young Generation,type=GarbageCollector
java.lang:name=G1 Old Generation,type=GarbageCollector
{code}
Pauses detection still works:
{code:java}
INFO [Notification Thread] 2025-11-15T16:19:13,080 GCInspector.java:312 - G1
Young Generation GC in 233ms. G1 Eden Space: 6476005376 -> 0; G1 Old Gen:
22573666304 -> 23790014464; G1 Survivor Space: 1258291200 -> 1073741824;
{code}
Note: we do not have an official support of JDK 21 as of now, I so have not
tested G1 concurrent mode, A new JMX bean for it has been added in JDK20
([https://bugs.openjdk.org/browse/JDK-8297754]).
h3. Shenandoah
[^system_shenandoah.log] [^gc_shenandoah.log]
MBeans
{code:java}
java.lang:name=Shenandoah Pauses,type=GarbageCollector
java.lang:name=Shenandoah Cycles,type=GarbageCollector
{code}
We log Shenandoah Cycles concurrent GCs with time > 1 sec as INFO:
{code:java}
INFO [Notification Thread] 2025-11-15T16:31:21,372 GCInspector.java:300 -
Shenandoah Cycles GC in 1762ms. CodeHeap 'non-nmethods': 2047744 -> 3201536;
CodeHeap 'non-profiled nmethods': 18649088 -> 18653952; CodeHeap 'profiled
nmethods': 19999488 -> 20063744; Metaspace: 63209472 -> 63218224; Shenandoah:
17291619784 -> 6818925104
{code}
We log Shenandoah Cycles concurrent GCs with time > 2 sec as WARN:
{code:java}
WARN [Notification Thread] 2025-11-15T16:30:53,812 GCInspector.java:298 -
Shenandoah Cycles GC in 3813ms. CodeHeap 'non-nmethods': 3778432 -> 2047744;
CodeHeap 'non-profiled nmethods': 18428800 -> 18444032; CodeHeap 'profiled
nmethods': 19460992 -> 19521024; Metaspace: 63072296 -> 63089704; Shenandoah:
22980826480 -> 22186858112
{code}
We do not log Shenandoah Cycles concurrent GCs with time < 1 sec and > 300 ms
(this is the main goal of this improvement):
{code:java}
// it is present only in debug output but not reported by GCInspector real
logging
INFO [Notification Thread] 2025-11-15T16:31:50,315 GCInspector.java:265 -
debug, gcId: 51, gcName: Shenandoah Cycles, gcCause: Concurrent GC, gcAction:
end of GC cycle, event duration: 349, calculated duration: 348
{code}
h3. ZGC
[^system_zgc.log] [^gc_zgc.log]
MBeans
{code:java}
java.lang:name=ZGC Cycles,type=GarbageCollector
java.lang:name=ZGC Pauses,type=GarbageCollector
{code}
We log ZGC Cycles concurrent GCs with time > 1 sec as INFO:
{code:java}
INFO [Notification Thread] 2025-11-15T16:38:53,877 GCInspector.java:300 - ZGC
Cycles GC in 1694ms. CodeHeap 'non-nmethods': 3414912 -> 1684224; CodeHeap
'non-profiled nmethods': 13763712 -> 14028160; CodeHeap 'profiled nmethods':
19596416 -> 18350848; Compressed Class Space: 7549152 -> 7551488; Metaspace:
62210520 -> 62218096; ZHeap: 6757023744 -> 9124708352
{code}
We log ZGC Cycles concurrent GCs with time > 2 sec as WARN:
{code:java}
WARN [Notification Thread] 2025-11-15T16:38:56,746 GCInspector.java:298 - ZGC
Cycles GC in 2660ms. CodeHeap 'non-nmethods': 3414912 -> 1684224; CodeHeap
'non-profiled nmethods': 14354048 -> 14567424; CodeHeap 'profiled nmethods':
18724352 -> 18601856; Compressed Class Space: 7556032 -> 7556544; Metaspace:
62223720 -> 62262456; ZHeap: 10284433408 -> 17983078400
{code}
We do not log ZGC Cycles concurrent GCs with time < 1 sec and > 300 ms (this is
the main goal of this improvement):
{code:java}
// it is present only in debug output but not reported by GCInspector real
logging
INFO [Notification Thread] 2025-11-15T16:42:49,461 GCInspector.java:265 -
debug, gcId: 50, gcName: ZGC Cycles, gcCause: Allocation Rate, gcAction: end of
GC cycle, event duration: 379, calculated duration: 380
{code}
> GCInspector should use different thresholds on GC events for warning messages
> -----------------------------------------------------------------------------
>
> Key: CASSANDRA-20980
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20980
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Observability/Logging
> Reporter: Yuqi Yan
> Assignee: Yuqi Yan
> Priority: Normal
> Fix For: 5.x
>
> Attachments: debug_logging_gc_inspector.patch, gc_g1.log,
> gc_shenandoah.log, gc_zgc.log, system_g1.log, system_shenandoah.log,
> system_zgc.log
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Running Cassandra 4.1 and recently testing with ShenandoahGC.
> Noticed that the log message is flooded with the warning message coming from
> GCInspector.
> Taking a closer look, that seems that we're warning on every GC event that
> takes longer than 1 second (default). And from document that the intention
> was to monitor GC pause time.
> But for ShenandoahGC and ZGC, there are concurrent phases (no STW) and phases
> that actually STW.
> For those concurrent phases, it's expected to take longer time (if we're on
> the non-generational version), and I feel that these phases:
> 1. require STW
> 2. doesn't require STW
> should use different thresholds in warning message.
> At least to me, phases require STW actually pause the thread, and should use
> lower threshold compared to those concurrent phases.
> Will share a patch soon
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]