On Wed, 24 Jun 2026 10:25:09 GMT, Aleksey Shipilev <[email protected]> wrote:
>> While following up on concurrent marking performance, I noticed that we >> stopped / failed to inline some of the hot methods in marking loop. We need >> to rework this. >> >> This PR replaces the build-time GCC-specific "bump" for inlining heuristics >> into explicit inlining hints across the hot path. I have eyeballed the >> profiles on typical workloads and the inlining makes sense now. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` >> - [x] Ad-hoc marking performance tests >> - [ ] Regular testing pipelines >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains seven additional > commits since the last revision: > > - Keep "inline" on do_chunked_array > - Merge branch 'master' into JDK-8385643-shenandoah-rework-mark-inline > - Cosmetics > - Benchmarks show array path is still important > - A few more cosmetics > - Touchup > - Work On SPECjbb preset-IR run, seeing +20...30% faster concurrent marks as well: # Baseline [170.333s][info][gc,stats] Pause Init Mark (G) = 0.031 s (a = 201 us) (n = 157) (lvls, us = 64, 102, 117, 154, 2280) [170.333s][info][gc,stats] Pause Init Mark (N) = 0.004 s (a = 26 us) (n = 157) (lvls, us = 17, 22, 24, 28, 55) [170.333s][info][gc,stats] Update Region States = 0.002 s (a = 11 us) (n = 157) (lvls, us = 6, 8, 10, 12, 32) [170.333s][info][gc,stats] Propagate GC State = 0.000 s (a = 2 us) (n = 157) (lvls, us = 1, 2, 2, 2, 10) [170.333s][info][gc,stats] Concurrent Mark Roots = 0.052 s (a = 333 us) (n = 157) (lvls, us = 176, 262, 289, 318, 2094) [170.333s][info][gc,stats] CMR: Threads = 0.231 s (a = 1472 us) (n = 157) (lvls, us = 779, 1250, 1328, 1465, 6545) [170.333s][info][gc,stats] CMR: VM Strongs = 0.009 s (a = 55 us) (n = 157) (lvls, us = 31, 39, 43, 47, 1031) [170.333s][info][gc,stats] CMR: Classes = 0.016 s (a = 100 us) (n = 157) (lvls, us = 67, 81, 90, 105, 775) [170.333s][info][gc,stats] Concurrent Marking = 35.671 s (a = 227203 us) (n = 157) (lvls, us = 6016, 226562, 232422, 238281, 254850) [170.333s][info][gc,stats] CM: Work = 283.904 s (a = 1808304 us) (n = 157) (lvls, us = 46875, 1796875, 1855469, 1894531, 2033774) [170.333s][info][gc,stats] Flush SATB = 0.142 s (a = 902 us) (n = 157) (lvls, us = 105, 549, 623, 783, 5526) [170.333s][info][gc,stats] Pause Final Mark (G) = 0.048 s (a = 304 us) (n = 157) (lvls, us = 123, 221, 238, 258, 3370) [170.333s][info][gc,stats] Pause Final Mark (N) = 0.030 s (a = 193 us) (n = 157) (lvls, us = 96, 176, 191, 205, 306) [170.333s][info][gc,stats] Flush SATB and Roots = 0.003 s (a = 22 us) (n = 157) (lvls, us = 7, 14, 15, 17, 108) [170.333s][info][gc,stats] Propagate GC State = 0.000 s (a = 2 us) (n = 157) (lvls, us = 1, 2, 2, 2, 3) [170.333s][info][gc,stats] Update Region States = 0.005 s (a = 31 us) (n = 157) (lvls, us = 22, 30, 31, 32, 43) [170.333s][info][gc,stats] Choose Collection Set = 0.017 s (a = 111 us) (n = 157) (lvls, us = 35, 98, 113, 121, 167) [170.333s][info][gc,stats] Rebuild Free Set = 0.003 s (a = 17 us) (n = 157) (lvls, us = 13, 16, 16, 17, 41) # Patched [171.311s][info][gc,stats] Pause Init Mark (G) = 0.039 s (a = 270 us) (n = 143) (lvls, us = 77, 104, 113, 162, 2290) [171.311s][info][gc,stats] Pause Init Mark (N) = 0.004 s (a = 25 us) (n = 143) (lvls, us = 12, 22, 23, 27, 55) [171.311s][info][gc,stats] Update Region States = 0.002 s (a = 11 us) (n = 143) (lvls, us = 6, 8, 9, 12, 29) [171.311s][info][gc,stats] Propagate GC State = 0.000 s (a = 2 us) (n = 143) (lvls, us = 1, 2, 2, 2, 3) [171.311s][info][gc,stats] Concurrent Mark Roots = 0.051 s (a = 355 us) (n = 143) (lvls, us = 213, 271, 287, 311, 6407) [171.311s][info][gc,stats] CMR: Threads = 0.211 s (a = 1475 us) (n = 143) (lvls, us = 801, 1289, 1367, 1465, 6937) [171.311s][info][gc,stats] CMR: VM Strongs = 0.008 s (a = 54 us) (n = 143) (lvls, us = 31, 38, 44, 49, 638) [171.311s][info][gc,stats] CMR: Classes = 0.014 s (a = 97 us) (n = 143) (lvls, us = 61, 74, 88, 102, 678) [171.311s][info][gc,stats] Concurrent Marking = 26.880 s (a = 187970 us) (n = 143) (lvls, us = 4082, 187500, 191406, 197266, 210714) [171.311s][info][gc,stats] CM: Work = 213.791 s (a = 1495043 us) (n = 143) (lvls, us = 31641, 1484375, 1523438, 1562500, 1665250) [171.311s][info][gc,stats] Flush SATB = 0.119 s (a = 836 us) (n = 143) (lvls, us = 89, 607, 660, 779, 2852) [171.311s][info][gc,stats] Pause Final Mark (G) = 0.041 s (a = 283 us) (n = 143) (lvls, us = 115, 232, 246, 266, 1350) [171.311s][info][gc,stats] Pause Final Mark (N) = 0.029 s (a = 203 us) (n = 143) (lvls, us = 78, 188, 199, 215, 300) [171.311s][info][gc,stats] Flush SATB and Roots = 0.003 s (a = 21 us) (n = 143) (lvls, us = 5, 13, 14, 16, 105) [171.311s][info][gc,stats] Propagate GC State = 0.000 s (a = 2 us) (n = 143) (lvls, us = 1, 2, 2, 2, 3) [171.311s][info][gc,stats] Update Region States = 0.005 s (a = 32 us) (n = 143) (lvls, us = 11, 30, 31, 33, 50) [171.311s][info][gc,stats] Choose Collection Set = 0.017 s (a = 120 us) (n = 143) (lvls, us = 34, 109, 123, 131, 180) [171.311s][info][gc,stats] Rebuild Free Set = 0.002 s (a = 17 us) (n = 143) (lvls, us = 14, 16, 17, 18, 30) ------------- PR Comment: https://git.openjdk.org/jdk/pull/31634#issuecomment-4788576410
