Hi,

Improved canonization after r216728 causes GCC to more often generate poor code 
due to suboptimal ordering of operand of commutative libcalls. Indeed, if one 
of the operands of a commutative operation is the result of a previous 
operation, both being implemented by libcall, the wrong ordering of the 
operands in the second operation can lead to extra mov. Consider the following 
case on softfloat targets:

double
test1 (double x, double y)
{
  return x * (x + y);
}

If x + y is put in the operand using the same register as the result of the 
libcall for x + y then no mov is generated, otherwise mov is needed. The 
following happens on arm softfloat with the right ordering:

bl      __aeabi_dadd
ldr     r2, [sp]
ldr     r3, [sp, #4]
/* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */
bl      __aeabi_dmul

With the wrong ordering one gets:

bl      __aeabi_dadd
mov     r2, r0
mov     r3, r1
ldr     r0, [sp]
ldr     r1, [sp, #4]
bl      __aeabi_dmul

This patch extend the patch written by Yuri Rumyantsev in r219646 to also deal 
with the case of only one of the operand being the result of an operation. 
ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR tree-optimization/63743
        * cfgexpand.c (reorder_operands): Also reorder if only second operand
        had its definition forwarded by TER.

*** gcc/testsuite/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudho...@arm.com>

        PR tree-optimization/63743
        * gcc.dg/pr63743.c: New test.


diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 7dfe1f6..4fbc037 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb)
        continue;
       /* Swap operands if the second one is more expensive.  */
       def0 = get_gimple_for_ssa_name (op0);
-      if (!def0)
-       continue;
       def1 = get_gimple_for_ssa_name (op1);
       if (!def1)
        continue;
       swap = false;
-      if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
+      if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
        swap = true;
       if (swap)
        {
@@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb)
              fprintf (dump_file, "Swap operands in stmt:\n");
              print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
              fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n",
-                      lattice[gimple_uid (def0)],
+                      def0 ? lattice[gimple_uid (def0)] : 0,
                       lattice[gimple_uid (def1)]);
            }
          swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c
new file mode 100644
index 0000000..87254ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr63743.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-expand-details" } */
+
+double
+libcall_dep (double x, double y)
+{
+  return x * (x + y);
+}
+
+/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */


Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler 
targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any 
regression. 
CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code 
size increase on x86_64-linux-gnu.

Is it ok for trunk (since it fixes a code size regression in 5.0)?

Best regards,

Thomas



Reply via email to