The following patch is for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85072
The patch was successfully bootstrapped and tested on x86-64.
The commit message says how the patch improves LRA on the testcase which
contains a huge functions with 1M pseudos.
I'll try to improve LRA speed more but I do not guarantee I'll succeed.
commit 43afcb3a83c3648141285d80cd3d8a562047fb43
Author: Vladimir N. Makarov <[email protected]>
Date: Wed Dec 3 13:42:41 2025 -0500
[PR85072, LRA]: Set a limit for considering other reload pseudo preferences
Compilation of test in PR85072 takes a lot of time and memory, e.g. 17
minutes and 23 GB memory on AMD 9900X. The function in question has one million
program points and one million pseudos. The culprits are
live_reload_and_inheritance_pseudos bitmaps which are used to consider
other reload pseudo preferences when assigning a register to a given
pseudo. The patch introduces a constraint regarding when those
preferences are considered. The patch decreases compilation time to
about 10 minutes and memory consumption to about 2GB.
gcc/ChangeLog:
PR rtl-optimization/85072
* lra-assigns.cc (init_live_reload_and_inheritance_pseudos):
Improve calculation of live_reload_and_inheritance_pseudos and set
a constraint to do this.
* params.opt
(lra-max-pseudos-points-log2-considered-for-preferences): New.
* doc/invoke.texi
(lra-max-pseudos-points-log2-considered-for-preferences): Document
it.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2da2a27acd6..a77976594fb 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17878,6 +17878,14 @@ uses for scheduling a loop.
The max number of reload pseudos which are considered during
spilling a non-reload pseudo.
+@item lra-max-pseudos-points-log2-considered-for-preferences
+The maximum @code{log2(number of reload pseudos * number of
+program points)} threshold when preferences for other reload pseudos
+are still considered. Taking these preferences into account helps to
+improve register allocation. However, for very large functions, a
+large value can result in significant compilation time and memory
+consumption. The default value is 30.
+
@item max-pow-sqrt-depth
Maximum depth of sqrt chains to use when synthesizing exponentiation
by a real constant.
diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
index 46f9c9d20e2..fbd5a5265cc 100644
--- a/gcc/lra-assigns.cc
+++ b/gcc/lra-assigns.cc
@@ -419,12 +419,31 @@ init_live_reload_and_inheritance_pseudos (void)
for (p = 0; p < lra_live_max_point; p++)
bitmap_initialize (&live_reload_and_inheritance_pseudos[p],
&live_reload_and_inheritance_pseudos_bitmap_obstack);
+ if ((unsigned) (max_regno - lra_constraint_new_regno_start)
+ >= (1U << lra_max_pseudos_points_log2_considered_for_preferences)
+ / (lra_live_max_point + 1))
+ return;
+ bitmap_head start_points;
+ bitmap_initialize (&start_points,
+ &live_hard_reg_pseudos_bitmap_obstack);
+ for (i = lra_constraint_new_regno_start; i < max_regno; i++)
+ for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
+ bitmap_set_bit (&start_points, r->start);
for (i = lra_constraint_new_regno_start; i < max_regno; i++)
{
for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
- for (p = r->start; p <= r->finish; p++)
- bitmap_set_bit (&live_reload_and_inheritance_pseudos[p], i);
+ {
+ bitmap_iterator bi;
+ unsigned p;
+ EXECUTE_IF_SET_IN_BITMAP (&start_points, r->start, p, bi)
+ {
+ if (p > (unsigned) r->finish)
+ break;
+ bitmap_set_bit (&live_reload_and_inheritance_pseudos[p], i);
+ }
+ }
}
+ bitmap_clear (&start_points);
}
/* Finalize data about living reload pseudos at any given program
diff --git a/gcc/params.opt b/gcc/params.opt
index 7c226355c08..b897d1d2660 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -453,6 +453,14 @@ Minimal fall-through edge probability in percentage used to add BB to inheritanc
Common Joined UInteger Var(param_lra_max_considered_reload_pseudos) Init(500) Param Optimization
The max number of reload pseudos which are considered during spilling a non-reload pseudo.
+-param=lra-max-pseudos-points-log2-considered-for-preferences=
+Common Joined UInteger Var(lra_max_pseudos_points_log2_considered_for_preferences) Init(30) IntegerRange(0, 31) Param Optimization
+The maximum log2(number of reload pseudos * number of program points) threshold
+when preferences for other reload pseudos are still considered. Taking these
+preferences into account helps to improve register allocation. However, for
+very large functions, a large value can result in significant compilation time
+and memory consumption. The default value is 30.
+
-param=lto-max-partition=
Common Joined UInteger Var(param_max_partition_size) Init(1000000) Param
Maximal size of a partition for LTO (in estimated instructions).