https://gcc.gnu.org/g:063fbd5a10d47d4957d605ca917480d02e054249

commit r15-9143-g063fbd5a10d47d4957d605ca917480d02e054249
Author: Andi Kleen <a...@gcc.gnu.org>
Date:   Tue Apr 1 11:48:11 2025 -0700

    PR119482: Avoid mispredictions in bitmap_set_bit
    
    bitmap_set_bit checks the original value of the bit to return it to the
    caller and then only writes the new value back if it changes.
    Most callers of bitmap_set_bit don't need the return value, but with the 
conditional store
    the CPU still has to predict it correctly since gcc doesn't know how to do
    that without APX on x86 (even though CMOV could do it with a dummy target).
    Really if-conversion should handle this case, but for now we can fix
    it.
    
    This simple patch improves runtime by 15% for the test case in the PR.
    Which is more than I expected given it only has ~1.44% of the cycles, but I 
guess
    the mispredicts caused some down stream effects.
    
      cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet
      ran 1.15 ± 0.01 times faster than cc1plus -std=gnu++20 -O2 pr119482.cc  
-quiet
    
    At least with this test case the total number of branches decreases
    drastically. Even though the mispredict rate goes up slightly it is
    still a big win.
    
    $ perf stat -e 
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \
    -a ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119482.cc  -quiet -w
    
     Performance counter stats for 'system wide':
    
        41,932,957,091      branches
           686,117,623      branch-misses                    #    1.64% of all 
branches
             43,690.47 MiB  uncore_imc/cas_count_read/
             12,362.56 MiB  uncore_imc/cas_count_write/
    
          49.328633365 seconds time elapsed
    
    $ perf stat -e 
branches,branch-misses,uncore_imc/cas_count_read/,uncore_imc/cas_count_write/ \
    -a ../obj-fast/gcc/cc1plus-bitmap -std=gnu++20 -O2 pr119482.cc  -quiet -w
    
     Performance counter stats for 'system wide':
    
        37,092,113,179      branches
           663,641,708      branch-misses                    #    1.79% of all 
branches
             43,196.52 MiB  uncore_imc/cas_count_read/
             12,369.33 MiB  uncore_imc/cas_count_write/
    
          42.632458350 seconds time elapsed
    
    gcc/ChangeLog:
    
            PR middle-end/119482
            * bitmap.cc (bitmap_set_bit): Write back value unconditionally

Diff:
---
 gcc/bitmap.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/bitmap.cc b/gcc/bitmap.cc
index f5a64b495ab3..c3daa91c4f89 100644
--- a/gcc/bitmap.cc
+++ b/gcc/bitmap.cc
@@ -969,8 +969,8 @@ bitmap_set_bit (bitmap head, int bit)
   if (ptr != 0)
     {
       bool res = (ptr->bits[word_num] & bit_val) == 0;
-      if (res)
-       ptr->bits[word_num] |= bit_val;
+      /* Write back unconditionally to avoid branch mispredicts.  */
+      ptr->bits[word_num] |= bit_val;
       return res;
     }

Reply via email to