Am 31.07.25 um 12:30 schrieb Denis Chertykov:
ср, 30 июл. 2025 г. в 14:59, Georg-Johann Lay <a...@gjlay.de>:

Insn combine may come up with superfluous reg-reg moves, where the
combine people say that these are no problem since reg-alloc is supposed
to optimize them.  The issue is that the lower-subreg pass sitting
between combine and reg-alloc may split such moves, coming up with a zoo
of subregs which are only handled poorly by the register allocator.

This patch adds a new avr mini-pass that handles such cases.

As an example, take

int f_ffssi (long x)
{
      return __builtin_ffsl (x);
}

where the two functions have the same interface, i.e. there are no extra
moves required for the argument or for the return value. However,

$ avr-gcc -S -Os -dp -mno-fuse-move ...

f_ffssi:
         mov r20,r22      ;  29  [c=4 l=1]  movqi_insn/0
         mov r21,r23      ;  30  [c=4 l=1]  movqi_insn/0
         mov r22,r24      ;  31  [c=4 l=1]  movqi_insn/0
         mov r23,r25      ;  32  [c=4 l=1]  movqi_insn/0
         mov r25,r23      ;  33  [c=4 l=4]  *movsi/0
         mov r24,r22
         mov r23,r21
         mov r22,r20
         rcall __ffssi2   ;  34  [c=16 l=1]  *ffssihi2.libgcc
         ret              ;  37  [c=0 l=1]  return

where all the moves add up to a no-op.  The -mno-fuse-move option
stops any attempts by the avr backend to clean up that mess.

gcc/
         * config/avr/avr-passes.def (avr_pass_2moves): Insert after combine.
         * config/avr/avr-passes.cc (make_avr_pass_2moves): New function.
         (pass_data avr_pass_data_2moves): New static variable.
         (avr_pass_2moves): New rtl_opt_pass.
         * config/avr/avr-protos.h (make_avr_pass_2moves): New proto.

Ok for trunk?

Ok
Please apply

Denis

Applied with the attached addendum which adds an own
option for better control.

Johann
diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc
index 203a9652818..d8b982c4fa6 100644
--- a/gcc/common/config/avr/avr-common.cc
+++ b/gcc/common/config/avr/avr-common.cc
@@ -38,6 +38,7 @@ static const struct default_options avr_option_optimization_table[] =
     { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },
     { OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_mfuse_add_, NULL, 2 },
+    { OPT_LEVELS_1_PLUS, OPT_mfuse_move2, NULL, 1 },
     { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mfuse_move_, NULL, 3 },
     { OPT_LEVELS_2_PLUS, OPT_mfuse_move_, NULL, 23 },
     { OPT_LEVELS_2_PLUS, OPT_msplit_bit_shift, NULL, 1 },
diff --git a/gcc/config/avr/avr-passes.cc b/gcc/config/avr/avr-passes.cc
index 1bcf211358e..69df6d263f6 100644
--- a/gcc/config/avr/avr-passes.cc
+++ b/gcc/config/avr/avr-passes.cc
@@ -4869,7 +4869,7 @@ public:
 
   unsigned int execute (function *func) final override
   {
-    if (optimize > 0 && avropt_fuse_move > 0)
+    if (optimize && avropt_fuse_move2)
       {
 	bool changed = false;
 	basic_block bb;
diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index 988311927bd..7f6f18c3f23 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -164,6 +164,10 @@ mfuse-move=
 Target Joined RejectNegative UInteger Var(avropt_fuse_move) Init(0) Optimization IntegerRange(0, 23)
 -mfuse-move=<0,23>	Optimization. Run a post-reload pass that tweaks move instructions.
 
+mfuse-move2
+Target Var(avropt_fuse_move2) Init(0) Optimization
+Optimization. Fuse some move insns after insn combine.
+
 mabsdata
 Target Mask(ABSDATA)
 Assume that all data in static storage can be accessed by LDS / STS instructions.  This option is only useful for reduced Tiny devices like ATtiny40.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 09802303254..ce139213ee7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -915,7 +915,7 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options} (@ref{AVR Options})
 @gccoptlist{-mmcu=@var{mcu}  -mabsdata  -maccumulate-args  -mcvt
 -mbranch-cost=@var{cost}  -mfuse-add=@var{level}  -mfuse-move=@var{level}
--mcall-prologues  -mgas-isr-prologues  -mint8  -mflmap
+-mfuse-move2  -mcall-prologues  -mgas-isr-prologues  -mint8  -mflmap
 -mdouble=@var{bits}  -mlong-double=@var{bits}  -mno-call-main
 -mn_flash=@var{size}  -mfract-convert-truncate  -mno-interrupts
 -mmain-is-OS_task  -mrelax  -mrmw  -mstrict-X  -mtiny-stack
@@ -25110,6 +25110,10 @@ Valid values for @var{level} are in the range @code{0} @dots{} @code{23}
 which is a 3:2:2:2 mixed radix value.  Each digit controls some
 aspect of the optimization.
 
+@opindex mfuse-move2
+@item -mfuse-move2
+Run a post combine optimization pass that tries to fuse move instructions.
+
 @opindex mstrict-X
 @item -mstrict-X
 Use address register @code{X} in a way proposed by the hardware.  This means

Reply via email to