https://gcc.gnu.org/g:43afcb3a83c3648141285d80cd3d8a562047fb43

commit r16-5876-g43afcb3a83c3648141285d80cd3d8a562047fb43
Author: Vladimir N. Makarov <[email protected]>
Date:   Wed Dec 3 13:42:41 2025 -0500

    [PR85072, LRA]: Set a limit for considering other reload pseudo preferences
    
    Compilation of test in PR85072 takes a lot of time and memory, e.g. 17
    minutes and 23 GB memory on AMD 9900X.  The function in question has one 
million
    program points and one million pseudos. The culprits are
    live_reload_and_inheritance_pseudos bitmaps which are used to consider
    other reload pseudo preferences when assigning a register to a given
    pseudo.  The patch introduces a constraint regarding when those
    preferences are considered.  The patch decreases compilation time to
    about 10 minutes and memory consumption to about 2GB.
    
    gcc/ChangeLog:
    
            PR rtl-optimization/85072
            * lra-assigns.cc (init_live_reload_and_inheritance_pseudos):
            Improve calculation of live_reload_and_inheritance_pseudos and set
            a constraint to do this.
            * params.opt
            (lra-max-pseudos-points-log2-considered-for-preferences): New.
            * doc/invoke.texi
            (lra-max-pseudos-points-log2-considered-for-preferences): Document
            it.

Diff:
---
 gcc/doc/invoke.texi |  8 ++++++++
 gcc/lra-assigns.cc  | 23 +++++++++++++++++++++--
 gcc/params.opt      |  8 ++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2da2a27acd68..a77976594fbd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17878,6 +17878,14 @@ uses for scheduling a loop.
 The max number of reload pseudos which are considered during
 spilling a non-reload pseudo.
 
+@item lra-max-pseudos-points-log2-considered-for-preferences
+The maximum @code{log2(number of reload pseudos * number of
+program points)} threshold when preferences for other reload pseudos
+are still considered. Taking these preferences into account helps to
+improve register allocation. However, for very large functions, a
+large value can result in significant compilation time and memory
+consumption. The default value is 30.
+
 @item max-pow-sqrt-depth
 Maximum depth of sqrt chains to use when synthesizing exponentiation
 by a real constant.
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 46f9c9d20e25..fbd5a5265cc6 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -419,12 +419,31 @@ init_live_reload_and_inheritance_pseudos (void)
   for (p = 0; p < lra_live_max_point; p++)
     bitmap_initialize (&live_reload_and_inheritance_pseudos[p],
                       &live_reload_and_inheritance_pseudos_bitmap_obstack);
+  if ((unsigned) (max_regno - lra_constraint_new_regno_start)  
+      >= (1U << lra_max_pseudos_points_log2_considered_for_preferences)
+      /  (lra_live_max_point + 1))
+    return;
+  bitmap_head start_points;
+  bitmap_initialize (&start_points,
+                    &live_hard_reg_pseudos_bitmap_obstack);
+  for (i = lra_constraint_new_regno_start; i < max_regno; i++)
+    for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+      bitmap_set_bit (&start_points, r->start);
   for (i = lra_constraint_new_regno_start; i < max_regno; i++)
     {
       for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
-       for (p = r->start; p <= r->finish; p++)
-         bitmap_set_bit (&live_reload_and_inheritance_pseudos[p], i);
+       {
+         bitmap_iterator bi;
+         unsigned p;
+         EXECUTE_IF_SET_IN_BITMAP (&start_points, r->start, p, bi)
+           {
+             if (p > (unsigned) r->finish)
+               break;
+             bitmap_set_bit (&live_reload_and_inheritance_pseudos[p], i);
+           }
+       }
     }
+  bitmap_clear (&start_points);
 }
 
 /* Finalize data about living reload pseudos at any given program
diff --git a/gcc/params.opt b/gcc/params.opt
index 7c226355c086..b897d1d2660d 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -453,6 +453,14 @@ Minimal fall-through edge probability in percentage used 
to add BB to inheritanc
 Common Joined UInteger Var(param_lra_max_considered_reload_pseudos) Init(500) 
Param Optimization
 The max number of reload pseudos which are considered during spilling a 
non-reload pseudo.
 
+-param=lra-max-pseudos-points-log2-considered-for-preferences=
+Common Joined UInteger 
Var(lra_max_pseudos_points_log2_considered_for_preferences) Init(30) 
IntegerRange(0, 31) Param Optimization
+The maximum log2(number of reload pseudos * number of program points) threshold
+when preferences for other reload pseudos are still considered. Taking these
+preferences into account helps to improve register allocation. However, for
+very large functions, a large value can result in significant compilation time
+and memory consumption. The default value is 30.
+
 -param=lto-max-partition=
 Common Joined UInteger Var(param_max_partition_size) Init(1000000) Param
 Maximal size of a partition for LTO (in estimated instructions).

Reply via email to