heesung-sn opened a new pull request, #15762:
URL: https://github.com/apache/pulsar/pull/15762

   <!--
   ### Contribution Checklist
     
     - PR title format should be *[type][component] summary*. For details, see 
*[Guideline - Pulsar PR Naming 
Convention](https://docs.google.com/document/d/1d8Pw6ZbWk-_pCKdOmdvx9rnhPiyuxwq60_TrD68d7BA/edit#heading=h.trs9rsex3xom)*.
 
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and 
this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   Master Issue: #15207
   
   ### Motivation
   
   # Pulsar Server Default GC Update
   
   As Java 17 will be officially required for pulsar-2.11+, it would be worth 
to revisit Pulsar’s GC default configurations 
   and consider the newer GC, ZGC or ShenandoahGC as the new default.
   
   ## ZGC:
   One could easily find ZGC intro articles[1][2][3]. I personally found the 
following persuasive.
   
   “The primary goals of ZGC are low latency, scalability, and ease of use. To 
achieve this, ZGC allows a Java application to continue running while it 
performs all garbage collection operations except thread stack scanning. It 
scales from a few hundred MB to TB-size Java heaps, while consistently 
maintaining very low pause times—typically within 2 ms.
   The implications of predictably low pause times could be profound for both 
application developers and system architects. Developers will no longer need to 
worry about designing elaborate ways to avoid garbage collection pauses. And 
system architects will not require specialized GC performance tuning expertise 
to achieve the dependably low pause times that are very important for so many 
use cases. This makes ZGC a good fit for applications that require large 
amounts of memory, such as with big data. However, ZGC is also a good candidate 
for smaller heaps that require predictable and extremely low pause times.”[3]
   
   ### The less settings, the better
   One might further tune G1GC flags to outperform ZGC, but our goal is to make 
the default GC perform well enough to cover general use-cases —  it should be 
rare for users to further tune GC flags. It is promoted that ZGC requires less 
tunings.
   1. ZGC is designed to guarantee low pause time
   2. ZGC scales well independent of application heap-size, hundred MBs to TBs
   
   
   ## ShenandoahGC:
   ShenandoahGC shares the similar designs to ZGC, promoting low pause time as 
well. Nonetheless, because ShenandoahGC is not officially supported by Oracle, 
it is unavailable in Oracle built OpenJdks[4][5]. Hence, between ShenandoahGC 
and ZGC, Pulsar probably needs to take a more available option, ZGC, also 
considering the future support.
   Still, individual Pulsar users can override this default GC, depending on 
their use-case and OpenJdk versions.
   
   
   [1] https://wiki.openjdk.java.net/display/zgc/Main
   
   [2] 
https://developers.redhat.com/articles/2021/11/02/how-choose-best-java-garbage-collector
   
   [3] 
https://blogs.oracle.com/javamagazine/post/understanding-the-jdks-new-superfast-garbage-collectors
   
   [4] 
https://developers.redhat.com/blog/2019/04/19/not-all-openjdk-12-builds-include-shenandoah-heres-why
   
   [5] https://bugs.openjdk.java.net/browse/JDK-8215030
   
   ## Performance Tests:
   
   To confirm the performance benefits, we conducted the open-messaging 
benchmark.
   In this test, we skipped journalings to give more pressures on JVM GCs.
   
   ### Max Throughput Test
   - Workload : 1-topic-100-partitions-1kb-4p-4c-2000k
   
   | |Java11 G1GC | Java17 G1GC |Java17 ZGC | Java17 ShenandoahGC |
   |:---|:----|:-----|:---------|:--------------------------------|
   |Avg Pub rate(mb/s)|  1784| 1703|1711|1618|
   |Avg Cons rate(mb/s)| 1778| 1701|1711|1619|
   |Avg Backlog cnt(k)| 3159 | 121|30|64|
   |Avg Pub latency(ms)| 286|299 |296|294|
   
   
   ### Latency Test
   - Workload: 100-partitions-1kb-4p-4c-500k
   
   | |Java11 G1GC | Java17 G1GC |Java17 ZGC | Java17 ShenandoahGC |
   |:---|:----|:-----|:---------|:--------------------------------|
   |P999 Pub latency(ms)| 1.8 | 2.1| 2.1| 2.0|
   |P9999 Pub latency(ms)| 37.7 |32.9|20.2 | 37|
   
   
   ### Test Result Analysis
   #### ZGC performs well
   
   From the Max Throughput Test, ZGC performed well by keeping the lowest 
backlogs, avg 30k 
   while maintaining avg 1711mb/s throughput.
   
   From the Latency Test, although the latency difference is not very 
significant,
   ZGC showed the lowest p9999 Pub latency, 20.2ms.
   
   
   
   
   ### Modifications
   
   ## Pulsar Default GC Flag Update Proposal
   
   ### Before:
   https://github.com/apache/pulsar/blob/master/conf/pulsar_env.sh#L48
   
   - -XX:+UseG1GC
   - -XX:MaxGCPauseMillis=10
   - -XX:+ParallelRefProcEnabled
   - -XX:+UnlockExperimentalVMOptions
   - -XX:+DoEscapeAnalysis
   - -XX:ParallelGCThreads=32
   - -XX:ConcGCThreads=32
   - -XX:G1NewSizePercent=50
   - -XX:+DisableExplicitGC
   
   ### After:
   - -XX:+UseZGC
   - -XX:+PerfDisableSharedMem
   - -XX:+AlwaysPreTouch
   
   ### Update Details
   
   - Replace -XX:+UseG1GC with -XX:+ZGC
   
     - Pulsar adapts new GC technologies, which guarantees low latency, 
scalability, and ease of use
     - ZGC performs well on Pulsar according to the test result above
   
   - Remove -XX:MaxGCPauseMillis=10
     - irrelevant for ZGC. Instead, we rely on induced pause-time in ZGC.
     - ref: https://wiki.openjdk.java.net/display/zgc/Main
   - Remove -XX:+ParallelRefProcEnabled
     - irrelevant for ZGC. ZGC logic is inherently concurrent including 
reference processor logic. ZReferenceProcessor uses ZWorkers whose size is 
initialized based on ParallelGCThreads and ConcGCThreads
     - ref: https://wiki.openjdk.java.net/display/zgc/Main
     - ref: 
https://github.com/openjdk/zgc/blob/master/src/hotspot/share/gc/z/zReferenceProcessor.cpp#L423-L434
     - ref: 
https://github.com/openjdk/zgc/blob/master/src/hotspot/share/gc/z/zWorkers.cpp#L64-L66
   - Remove XX:+UnlockExperimentalVMOptions
       - Not required for ZGC in Java17.
       - ref: https://wiki.openjdk.java.net/display/zgc/Main
   - Remove -XX:+DoEscapeAnalysis:
       - enabled by default in Java17. no needs to be explicit for this flag
       - ref: https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html
   - Remove -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32
       - These are hardcoded thread counts, which could be over/under-sized 
depending on the runtime instance types. We better to use the default thread 
counts which are proportional to the number of cores on the instance. By 
default, ZGC configures 60% of cpu for XX:ParallelGCThreads and 25% of cpu for 
XX:ConcGCThreads.
       - ref: https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html
       - ref: 
https://github.com/openjdk/zgc/blob/master/src/hotspot/share/gc/z/zHeuristics.cpp#L87-L104
       - ref: 
https://github.com/openjdk/zgc/blob/master/src/hotspot/share/gc/z/zArguments.cpp#L57-L73
   - Remove -XX:G1NewSizePercent=50
       - This is for G1GC specific setup. irrelevant for ZGC
       - ref: https://wiki.openjdk.java.net/display/zgc/Main
   - Remove -XX:+DisableExplicitGC
       - pulsar code should not explicitly call System.gc().
       - ref: https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html
   - Add -XX:+PerfDisableSharedMem 
     - During GC, writing JVM performance data in the shared memory could take 
a long time due to disk i/o, unless RAM disk, tmpfs is configured for the JVM 
performance data file.
     - ref: 
https://github.com/openjdk/zgc/blob/master/src/hotspot/os/posix/perfMemory_posix.cpp#L1219-L1232
     - ref: https://issues.apache.org/jira/browse/CASSANDRA-9242
     - ref: https://groups.google.com/g/mechanical-sympathy/c/9SP4IM-MUrI
   - Add -XX:+AlwaysPreTouch
     - This will reduce the cold start latency, when converting virtual memory 
to physical memory. The tradeoff is that it will reduce page access time later, 
as the pages will already be loaded into memory(higher server bootstrap time).
     - ref: https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html
     - ref: https://access.redhat.com/solutions/2685771
   
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   This change is already covered by existing tests, such as all CIs
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API: (yes / **no**)
     - The schema: (yes / **no** / don't know)
     - The default values of configurations: (**yes** / no)
     - The wire protocol: (yes / **no**)
     - The rest endpoints: (yes / **no**)
     - The admin cli options: (yes / **no**)
     - Anything that affects deployment: (**yes** / no / don't know)
   
   ### Documentation
   
   Check the box below or label this PR directly.
   
   Need to update docs? 
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
     
   - [x] `no-need-doc` 
   This is pulsar's internal default GC setting change, but we probably need to 
mention this in the release note.
     
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-added`
   (Docs have been already added)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to