on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> on 2021/7/14 下午2:38, Richard Biener wrote:
>> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>>>
>>> on 2021/7/13 下午8:42, Richard Biener wrote:
>>>> On Tue, Jul 13, 2021 at 12:25 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>>> I guess the proposed IFN would be directly mapped for [us]mul_highpart?
>>
>> Yes.
>>
> 
> Thanks for confirming!  The related patch v2 is attached and the testing
> is ongoing.
> 

It's bootstrapped & regtested on powerpc64le-linux-gnu P9 and
aarch64-linux-gnu.  But on x86_64-redhat-linux there are XPASSes as below:

XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhuw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw
XFAIL->XPASS: gcc.target/i386/pr100637-3w.c scan-assembler pmulhw

They weren't exposed in the testing run with the previous patch which
doesn't use IFN way.  By investigating it, the difference comes from
the different costing on MULT_HIGHPART_EXPR and IFN_MULH.

For MULT_HIGHPART_EXPR, it's costed by 16 from below call:

        case MULT_EXPR:
        case WIDEN_MULT_EXPR:
        case MULT_HIGHPART_EXPR:
          stmt_cost = ix86_multiplication_cost (ix86_cost, mode);

While for IFN_MULH, it's costed by 4 as normal stmt so the total cost
becomes profitable and the expected vectorization happens.

One conservative fix seems to make IFN_MULH costing go through the
unique cost interface for multiplication, that is:

      case CFN_MULH:
        stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
        break;

As the test case marks the checks as "xfail", probably it's good to
revisit the costing on mul_highpart to ensure it's not priced more.

The attached patch also addressed Richard S.'s review comments on
two reformatting hunks.  Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

        * internal-fn.c (first_commutative_argument): Add info for IFN_MULH.
        * internal-fn.def (IFN_MULH): New internal function.
        * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
        recog normal multiply highpart as IFN_MULH.
        * config/i386/i386.c (ix86_add_stmt_cost): Adjust for combined
        function CFN_MULH.
---
 gcc/config/i386/i386.c   |  3 +++
 gcc/internal-fn.c        |  1 +
 gcc/internal-fn.def      |  2 ++
 gcc/tree-vect-patterns.c | 38 ++++++++++++++++++++++++++++----------
 4 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a93128fa0a4..1dd9108353c 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22559,6 +22559,9 @@ ix86_add_stmt_cost (class vec_info *vinfo, void *data, 
int count,
                                   mode == SFmode ? ix86_cost->fmass
                                   : ix86_cost->fmasd);
        break;
+      case CFN_MULH:
+       stmt_cost = ix86_multiplication_cost (ix86_cost, mode);
+       break;
       default:
        break;
       }
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index fb8b43d1ce2..b1b4289357c 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3703,6 +3703,7 @@ first_commutative_argument (internal_fn fn)
     case IFN_FNMS:
     case IFN_AVG_FLOOR:
     case IFN_AVG_CEIL:
+    case IFN_MULH:
     case IFN_MULHS:
     case IFN_MULHRS:
     case IFN_FMIN:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c3b8e730960..ed6d7de1680 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -169,6 +169,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | 
ECF_NOTHROW, first,
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
                              savg_ceil, uavg_ceil, binary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (MULH, ECF_CONST | ECF_NOTHROW, first,
+                             smul_highpart, umul_highpart, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST | ECF_NOTHROW, first,
                              smulhs, umulhs, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index b2e7fc2cc7a..ada89d7060b 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1896,8 +1896,15 @@ vect_recog_over_widening_pattern (vec_info *vinfo,
 
    1) Multiply high with scaling
      TYPE res = ((TYPE) a * (TYPE) b) >> c;
+     Here, c is bitsize (TYPE) / 2 - 1.
+
    2) ... or also with rounding
      TYPE res = (((TYPE) a * (TYPE) b) >> d + 1) >> 1;
+     Here, d is bitsize (TYPE) / 2 - 2.
+
+   3) Normal multiply high
+     TYPE res = ((TYPE) a * (TYPE) b) >> e;
+     Here, e is bitsize (TYPE) / 2.
 
    where only the bottom half of res is used.  */
 
@@ -1942,7 +1949,6 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
   stmt_vec_info mulh_stmt_info;
   tree scale_term;
   internal_fn ifn;
-  unsigned int expect_offset;
 
   /* Check for the presence of the rounding term.  */
   if (gimple_assign_rhs_code (rshift_input_stmt) == PLUS_EXPR)
@@ -1991,25 +1997,37 @@ vect_recog_mulhs_pattern (vec_info *vinfo,
 
       /* Get the scaling term.  */
       scale_term = gimple_assign_rhs2 (plus_input_stmt);
+      /* Check that the scaling factor is correct.  */
+      if (TREE_CODE (scale_term) != INTEGER_CST)
+       return NULL;
+
+      /* Check pattern 2).  */
+      if (wi::to_widest (scale_term) + target_precision + 2
+         != TYPE_PRECISION (lhs_type))
+       return NULL;
 
-      expect_offset = target_precision + 2;
       ifn = IFN_MULHRS;
     }
   else
     {
       mulh_stmt_info = rshift_input_stmt_info;
       scale_term = gimple_assign_rhs2 (last_stmt);
+      /* Check that the scaling factor is correct.  */
+      if (TREE_CODE (scale_term) != INTEGER_CST)
+       return NULL;
 
-      expect_offset = target_precision + 1;
-      ifn = IFN_MULHS;
+      /* Check for pattern 1).  */
+      if (wi::to_widest (scale_term) + target_precision + 1
+         == TYPE_PRECISION (lhs_type))
+       ifn = IFN_MULHS;
+      /* Check for pattern 3).  */
+      else if (wi::to_widest (scale_term) + target_precision
+              == TYPE_PRECISION (lhs_type))
+       ifn = IFN_MULH;
+      else
+       return NULL;
     }
 
-  /* Check that the scaling factor is correct.  */
-  if (TREE_CODE (scale_term) != INTEGER_CST
-      || wi::to_widest (scale_term) + expect_offset
-          != TYPE_PRECISION (lhs_type))
-    return NULL;
-
   /* Check whether the scaling input term can be seen as two widened
      inputs multiplied together.  */
   vect_unpromoted_value unprom_mult[2];
-- 
2.17.1

Reply via email to