Hi Richi,

Thanks for the comments!

on 2021/7/13 下午5:35, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>> Hi,
>>
>> When I added the support for Power10 newly introduced multiply
>> highpart instrutions, I noticed that currently vectorizer
>> doesn't try to vectorize multiply highpart pattern, I hope
>> this isn't intentional?
>>
>> This patch is to extend the existing pattern mulhs handlings
>> to cover multiply highpart.  Another alternative seems to
>> recog mul_highpart operation in a general place applied for
>> scalar code when the target supports the optab for the scalar
>> operation, it's based on the assumption that one target which
>> supports vector version of multiply highpart should have the
>> scalar version.  I noticed that the function can_mult_highpart_p
>> can check/handle mult_highpart well even without mul_highpart
>> optab support, I think to recog this pattern in vectorizer
>> is better.  Is it on the right track?
> 
> I think it's on the right track, using IFN_LAST is a bit awkward
> in case yet another case pops up so maybe you can use
> a code_helper instance instead which unifies tree_code,
> builtin_code and internal_fn?
> 

If there is one new requirement which doesn't have/introduce IFN
stuffs but have one existing tree_code, can we add one more field
with type tree_code, then for the IFN_LAST path we can check the
different requirements under the guard with that tree_code variable?

> I also notice that can_mult_highpart_p will return true if
> only vec_widen_[us]mult_{even,odd,hi,lo} are available,
> but then the result might be less optimal (or even not
> handled later)?
> 

I think it will be handled always?  The expander calls

rtx
expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
                      rtx target, bool uns_p)

which will further check with can_mult_highpart_p.

For the below case, 

#define SHT_CNT 16

__attribute__ ((noipa)) void
test ()
{
  for (int i = 0; i < N; i++)
    sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16;
}

Without this patch, it use widen_mult like below:

  vect__1.5_19 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.18_24 * 
1];
  vect__3.8_14 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.18_24 * 
1];
  vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR <vect__3.8_14, vect__1.5_19>;
  vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR <vect__3.8_14, vect__1.5_19>;
  vect__6.10_25 = vect_patt_22.9_13 >> 16;
  vect__6.10_26 = vect_patt_22.9_9 >> 16;
  vect__7.11_27 = VEC_PACK_TRUNC_EXPR <vect__6.10_25, vect__6.10_26>;
  MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.18_24 * 1] = 
vect__7.11_27;

.L2:
        lxvx 33,7,9
        lxvx 32,8,9
        vmulosh 13,0,1    // widen mult
        vmulesh 0,0,1
        xxmrglw 33,32,45  // merge
        xxmrghw 32,32,45
        vsraw 1,1,12      // shift
        vsraw 0,0,12
        vpkuwum 0,0,1     // pack
        stxvx 32,10,9
        addi 9,9,16
        bdnz .L2


With this patch, it ends up with:

  vect__1.5_14 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.17_24 * 
1];
  vect__3.8_8 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.17_24 * 
1];
  vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14;
  MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.17_24 * 1] = 
vect_patt_21.9_25;

.L2:
        lxvx 32,8,9
        lxvx 33,10,9
        vmulosh 13,0,1   // widen mult
        vmulesh 0,0,1
        vperm 0,0,13,12  // perm on widen mults
        stxvx 32,7,9
        addi 9,9,16
        bdnz .L2


> That is, what about adding optab internal functions
> for [us]mul_highpart instead, much like the existing
> ones for MULH{R,}S?
> 

OK, I was thinking the IFN way at the beginning, but was worried
that it's easy to be blamed saying it's not necessary since there
is one existing tree_code.  :-)  Will update it with IFN way.

BR,
Kewen

> Richard.
> 
>>
>> Bootstrapped & regtested on powerpc64le-linux-gnu P9,
>> x86_64-redhat-linux and aarch64-linux-gnu.
>>
>> BR,
>> Kewen
>> -----
>> gcc/ChangeLog:
>>
>>         * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to
>>         recog normal multiply highpart.
>>


Reply via email to