On Wed, 24 Jun 2026 10:25:09 GMT, Aleksey Shipilev <[email protected]> wrote:

>> While following up on concurrent marking performance, I noticed that we 
>> stopped / failed to inline some of the hot methods in marking loop. We need 
>> to rework this.
>> 
>> This PR replaces the build-time GCC-specific "bump" for inlining heuristics 
>> into explicit inlining hints across the hot path. I have eyeballed the 
>> profiles on typical workloads and the inlining makes sense now.
>> 
>> Additional testing:
>>  - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah`
>>  - [x] Ad-hoc marking performance tests
>>  - [ ] Regular testing pipelines
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Aleksey Shipilev has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains seven additional 
> commits since the last revision:
> 
>  - Keep "inline" on do_chunked_array
>  - Merge branch 'master' into JDK-8385643-shenandoah-rework-mark-inline
>  - Cosmetics
>  - Benchmarks show array path is still important
>  - A few more cosmetics
>  - Touchup
>  - Work

On SPECjbb preset-IR run, seeing +20...30% faster concurrent marks as well:


# Baseline
[170.333s][info][gc,stats] Pause Init Mark (G)            =    0.031 s (a =     
 201 us) (n =   157) (lvls, us =       64,      102,      117,      154,     
2280)
[170.333s][info][gc,stats] Pause Init Mark (N)            =    0.004 s (a =     
  26 us) (n =   157) (lvls, us =       17,       22,       24,       28,       
55)
[170.333s][info][gc,stats]   Update Region States         =    0.002 s (a =     
  11 us) (n =   157) (lvls, us =        6,        8,       10,       12,       
32)
[170.333s][info][gc,stats]   Propagate GC State           =    0.000 s (a =     
   2 us) (n =   157) (lvls, us =        1,        2,        2,        2,       
10)
[170.333s][info][gc,stats] Concurrent Mark Roots          =    0.052 s (a =     
 333 us) (n =   157) (lvls, us =      176,      262,      289,      318,     
2094)
[170.333s][info][gc,stats]   CMR: Threads                 =    0.231 s (a =     
1472 us) (n =   157) (lvls, us =      779,     1250,     1328,     1465,     
6545)
[170.333s][info][gc,stats]   CMR: VM Strongs              =    0.009 s (a =     
  55 us) (n =   157) (lvls, us =       31,       39,       43,       47,     
1031)
[170.333s][info][gc,stats]   CMR: Classes                 =    0.016 s (a =     
 100 us) (n =   157) (lvls, us =       67,       81,       90,      105,      
775)
[170.333s][info][gc,stats] Concurrent Marking             =   35.671 s (a =   
227203 us) (n =   157) (lvls, us =     6016,   226562,   232422,   238281,   
254850)
[170.333s][info][gc,stats]   CM: Work                     =  283.904 s (a =  
1808304 us) (n =   157) (lvls, us =    46875,  1796875,  1855469,  1894531,  
2033774)
[170.333s][info][gc,stats]   Flush SATB                   =    0.142 s (a =     
 902 us) (n =   157) (lvls, us =      105,      549,      623,      783,     
5526)
[170.333s][info][gc,stats] Pause Final Mark (G)           =    0.048 s (a =     
 304 us) (n =   157) (lvls, us =      123,      221,      238,      258,     
3370)
[170.333s][info][gc,stats] Pause Final Mark (N)           =    0.030 s (a =     
 193 us) (n =   157) (lvls, us =       96,      176,      191,      205,      
306)
[170.333s][info][gc,stats]   Flush SATB and Roots         =    0.003 s (a =     
  22 us) (n =   157) (lvls, us =        7,       14,       15,       17,      
108)
[170.333s][info][gc,stats]   Propagate GC State           =    0.000 s (a =     
   2 us) (n =   157) (lvls, us =        1,        2,        2,        2,        
3)
[170.333s][info][gc,stats]   Update Region States         =    0.005 s (a =     
  31 us) (n =   157) (lvls, us =       22,       30,       31,       32,       
43)
[170.333s][info][gc,stats]   Choose Collection Set        =    0.017 s (a =     
 111 us) (n =   157) (lvls, us =       35,       98,      113,      121,      
167)
[170.333s][info][gc,stats]   Rebuild Free Set             =    0.003 s (a =     
  17 us) (n =   157) (lvls, us =       13,       16,       16,       17,       
41)

# Patched
[171.311s][info][gc,stats] Pause Init Mark (G)            =    0.039 s (a =     
 270 us) (n =   143) (lvls, us =       77,      104,      113,      162,     
2290)
[171.311s][info][gc,stats] Pause Init Mark (N)            =    0.004 s (a =     
  25 us) (n =   143) (lvls, us =       12,       22,       23,       27,       
55)
[171.311s][info][gc,stats]   Update Region States         =    0.002 s (a =     
  11 us) (n =   143) (lvls, us =        6,        8,        9,       12,       
29)
[171.311s][info][gc,stats]   Propagate GC State           =    0.000 s (a =     
   2 us) (n =   143) (lvls, us =        1,        2,        2,        2,        
3)
[171.311s][info][gc,stats] Concurrent Mark Roots          =    0.051 s (a =     
 355 us) (n =   143) (lvls, us =      213,      271,      287,      311,     
6407)
[171.311s][info][gc,stats]   CMR: Threads                 =    0.211 s (a =     
1475 us) (n =   143) (lvls, us =      801,     1289,     1367,     1465,     
6937)
[171.311s][info][gc,stats]   CMR: VM Strongs              =    0.008 s (a =     
  54 us) (n =   143) (lvls, us =       31,       38,       44,       49,      
638)
[171.311s][info][gc,stats]   CMR: Classes                 =    0.014 s (a =     
  97 us) (n =   143) (lvls, us =       61,       74,       88,      102,      
678)
[171.311s][info][gc,stats] Concurrent Marking             =   26.880 s (a =   
187970 us) (n =   143) (lvls, us =     4082,   187500,   191406,   197266,   
210714)
[171.311s][info][gc,stats]   CM: Work                     =  213.791 s (a =  
1495043 us) (n =   143) (lvls, us =    31641,  1484375,  1523438,  1562500,  
1665250)
[171.311s][info][gc,stats]   Flush SATB                   =    0.119 s (a =     
 836 us) (n =   143) (lvls, us =       89,      607,      660,      779,     
2852)
[171.311s][info][gc,stats] Pause Final Mark (G)           =    0.041 s (a =     
 283 us) (n =   143) (lvls, us =      115,      232,      246,      266,     
1350)
[171.311s][info][gc,stats] Pause Final Mark (N)           =    0.029 s (a =     
 203 us) (n =   143) (lvls, us =       78,      188,      199,      215,      
300)
[171.311s][info][gc,stats]   Flush SATB and Roots         =    0.003 s (a =     
  21 us) (n =   143) (lvls, us =        5,       13,       14,       16,      
105)
[171.311s][info][gc,stats]   Propagate GC State           =    0.000 s (a =     
   2 us) (n =   143) (lvls, us =        1,        2,        2,        2,        
3)
[171.311s][info][gc,stats]   Update Region States         =    0.005 s (a =     
  32 us) (n =   143) (lvls, us =       11,       30,       31,       33,       
50)
[171.311s][info][gc,stats]   Choose Collection Set        =    0.017 s (a =     
 120 us) (n =   143) (lvls, us =       34,      109,      123,      131,      
180)
[171.311s][info][gc,stats]   Rebuild Free Set             =    0.002 s (a =     
  17 us) (n =   143) (lvls, us =       14,       16,       17,       18,       
30)

-------------

PR Comment: https://git.openjdk.org/jdk/pull/31634#issuecomment-4788576410

Reply via email to