Ryan Blough created HDDS-14106:
----------------------------------
Summary: Set -XX:NewRatio=3 explicitly when using
ConcurrentMarkSweep GC
Key: HDDS-14106
URL: https://issues.apache.org/jira/browse/HDDS-14106
Project: Apache Ozone
Issue Type: Bug
Affects Versions: 1.4.1, 2.0.0, 1.4.0, 2.1.0
Reporter: Ryan Blough
The Young Generation heap size is chronically too low when using the
ConcurrentMarkSweep garbage collector.
This is actually expected behavior, being present in openJDK since JDK6
(earliest I found was this openJDK jira:
[https://bugs.openjdk.org/browse/JDK-6872335], so predating Ozone). The root
cause is that instead of honoring the default NewRatio size of 2, an ergonomic
value is calculated based on the number of ParallelGC threads and the value of
the CMSYoungGenPerWorker flag, plus some other adjustments. The original goal
was to ensure that the ParallelGC run never went long (more than a second,
say); but the logic was written when 4 was a high number of cores to have
available on a server.
In light of modern hardware, this logic is entirely obsolete. The known
solution is to explicitly set the NewRatio value in the Java options,
overriding the ergonomic calculation.
I recommend a value of XX:NewRatio=3, resulting in a young generation heap size
of 25% of the total heap size. I choose NewRatio=3 because Ozone operations do
not appear to require a full 33% of the heap for Young Gen activity, and
because we don't want to reduce the size of the reminder of the heap unduly
during an upgrade or similar.
Importantly we should only set this when -XX:+UseConcMarkSweep isĀ _also_ set;
this problem does not impact other GC algorithms, and some of them (including
G1GC, as I understand it) rely on dynamically adjusting the Young Generation as
part of the algorithm.
Using NewRatio for our defaults is preferred to NewSize and MaxNewSize because
it dynamically adjusts with the total heap size changing, instead of requiring
manual Java option changes every time.
For context this problem was observed on a cluster approaching 900 nodes and
~70 petabytes, running on Azul Zulu JDK11. Flight Recorder data revealed that a
maximum heap size of 92GB had a maximum Young Generation size of 3.5GB set for
the Ozone Manager which was experiencing long GC pauses. The problem seemed to
be Young Generation heap thrashing, prematurely tenuring objects until the old
generation was full and driving spurious full GC pauses. Adding -XX:NewRatio=3
resolved the problem.
To reproduce the behavior, increase the heap size on any Ozone cluster running
on a JDK version where ConcurrentMarkSweep is available (which I believe is any
version prior to JDK14). Looking at the Young Generation size compared to the
total heap size will show the discrepancy, such as via the jmap command.
As a historical note, this was also a common GC-related problem in large HDFS
clusters; as far as I can tell the root cause never became common knowledge
there.
This is not a very high priority because ConcurrentMarkSweep is not even
available in JDK17 or 21, but I believe most working clusters still employ it,
and adding a java option is a simple change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]