Ryan Blough created HDDS-14106:
----------------------------------

             Summary: Set -XX:NewRatio=3 explicitly when using 
ConcurrentMarkSweep GC
                 Key: HDDS-14106
                 URL: https://issues.apache.org/jira/browse/HDDS-14106
             Project: Apache Ozone
          Issue Type: Bug
    Affects Versions: 1.4.1, 2.0.0, 1.4.0, 2.1.0
            Reporter: Ryan Blough


The Young Generation heap size is chronically too low when using the 
ConcurrentMarkSweep garbage collector.

This is actually expected behavior, being present in openJDK since JDK6 
(earliest I found was this openJDK jira: 
[https://bugs.openjdk.org/browse/JDK-6872335], so predating Ozone). The root 
cause is that instead of honoring the default NewRatio size of 2, an ergonomic 
value is calculated based on the number of ParallelGC threads and the value of 
the CMSYoungGenPerWorker flag, plus some other adjustments. The original goal 
was to ensure that the ParallelGC run never went long (more than a second, 
say); but the logic was written when 4 was a high number of cores to have 
available on a server.

In light of modern hardware, this logic is entirely obsolete. The known 
solution is to explicitly set the NewRatio value in the Java options, 
overriding the ergonomic calculation.

I recommend a value of XX:NewRatio=3, resulting in a young generation heap size 
of 25% of the total heap size. I choose NewRatio=3 because Ozone operations do 
not appear to require a full 33% of the heap for Young Gen activity, and 
because we don't want to reduce the size of the reminder of the heap unduly 
during an upgrade or similar.

Importantly we should only set this when -XX:+UseConcMarkSweep isĀ _also_ set; 
this problem does not impact other GC algorithms, and some of them (including 
G1GC, as I understand it) rely on dynamically adjusting the Young Generation as 
part of the algorithm.

Using NewRatio for our defaults is preferred to NewSize and MaxNewSize because 
it dynamically adjusts with the total heap size changing, instead of requiring 
manual Java option changes every time.

For context this problem was observed on a cluster approaching 900 nodes and 
~70 petabytes, running on Azul Zulu JDK11. Flight Recorder data revealed that a 
maximum heap size of 92GB had a maximum Young Generation size of 3.5GB set for 
the Ozone Manager which was experiencing long GC pauses. The problem seemed to 
be Young Generation heap thrashing, prematurely tenuring objects until the old 
generation was full and driving spurious full GC pauses. Adding -XX:NewRatio=3 
resolved the problem.

To reproduce the behavior, increase the heap size on any Ozone cluster running 
on a JDK version where ConcurrentMarkSweep is available (which I believe is any 
version prior to JDK14). Looking at the Young Generation size compared to the 
total heap size will show the discrepancy, such as via the jmap command.

As a historical note, this was also a common GC-related problem in large HDFS 
clusters; as far as I can tell the root cause never became common knowledge 
there.

This is not a very high priority because ConcurrentMarkSweep is not even 
available in JDK17 or 21, but I believe most working clusters still employ it, 
and adding a java option is a simple change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to