I would like to see some comment to the effect that this to allow inlining for the common case for widest int and offset int without inlining the uncommon case for regular wide-int.



On 11/28/2013 12:38 PM, Richard Sandiford wrote:
Currently add and sub have no fast path for offset_int and widest_int,
they just call the out-of-line version.  This patch handles the
single-HWI cases inline.  At least on x86_64, this only adds one branch
per call; the fast path itself is straight-line code.

On the same fold-const.ii testcase, this reduces the number of
add_large calls from 877507 to 42459.  It reduces the number of
sub_large calls from 25707 to 148.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.h
===================================================================
--- gcc/wide-int.h      2013-11-28 13:34:19.596839877 +0000
+++ gcc/wide-int.h      2013-11-28 16:08:11.387731775 +0000
@@ -2234,6 +2234,17 @@ wi::add (const T1 &x, const T2 &y)
        val[0] = xi.ulow () + yi.ulow ();
        result.set_len (1);
      }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+          && xi.len + yi.len == 2)
+    {
+      unsigned HOST_WIDE_INT xl = xi.ulow ();
+      unsigned HOST_WIDE_INT yl = yi.ulow ();
+      unsigned HOST_WIDE_INT resultl = xl + yl;
+      val[0] = resultl;
+      val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+      result.set_len (1 + (((resultl ^ xl) & (resultl ^ yl))
+                          >> (HOST_BITS_PER_WIDE_INT - 1)));
+    }
    else
      result.set_len (add_large (val, xi.val, xi.len,
                               yi.val, yi.len, precision,
@@ -2288,6 +2299,17 @@ wi::sub (const T1 &x, const T2 &y)
        val[0] = xi.ulow () - yi.ulow ();
        result.set_len (1);
      }
+  else if (STATIC_CONSTANT_P (precision > HOST_BITS_PER_WIDE_INT)
+          && xi.len + yi.len == 2)
+    {
+      unsigned HOST_WIDE_INT xl = xi.ulow ();
+      unsigned HOST_WIDE_INT yl = yi.ulow ();
+      unsigned HOST_WIDE_INT resultl = xl - yl;
+      val[0] = resultl;
+      val[1] = (HOST_WIDE_INT) resultl < 0 ? 0 : -1;
+      result.set_len (1 + (((resultl ^ xl) & (xl ^ yl))
+                          >> (HOST_BITS_PER_WIDE_INT - 1)));
+    }
    else
      result.set_len (sub_large (val, xi.val, xi.len,
                               yi.val, yi.len, precision,


Reply via email to