Hi Richi, Thanks for the comments!
on 2021/7/13 下午5:35, Richard Biener wrote: > On Tue, Jul 13, 2021 at 10:53 AM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> Hi, >> >> When I added the support for Power10 newly introduced multiply >> highpart instrutions, I noticed that currently vectorizer >> doesn't try to vectorize multiply highpart pattern, I hope >> this isn't intentional? >> >> This patch is to extend the existing pattern mulhs handlings >> to cover multiply highpart. Another alternative seems to >> recog mul_highpart operation in a general place applied for >> scalar code when the target supports the optab for the scalar >> operation, it's based on the assumption that one target which >> supports vector version of multiply highpart should have the >> scalar version. I noticed that the function can_mult_highpart_p >> can check/handle mult_highpart well even without mul_highpart >> optab support, I think to recog this pattern in vectorizer >> is better. Is it on the right track? > > I think it's on the right track, using IFN_LAST is a bit awkward > in case yet another case pops up so maybe you can use > a code_helper instance instead which unifies tree_code, > builtin_code and internal_fn? > If there is one new requirement which doesn't have/introduce IFN stuffs but have one existing tree_code, can we add one more field with type tree_code, then for the IFN_LAST path we can check the different requirements under the guard with that tree_code variable? > I also notice that can_mult_highpart_p will return true if > only vec_widen_[us]mult_{even,odd,hi,lo} are available, > but then the result might be less optimal (or even not > handled later)? > I think it will be handled always? The expander calls rtx expand_mult_highpart (machine_mode mode, rtx op0, rtx op1, rtx target, bool uns_p) which will further check with can_mult_highpart_p. For the below case, #define SHT_CNT 16 __attribute__ ((noipa)) void test () { for (int i = 0; i < N; i++) sh_c[i] = ((SI) sh_a[i] * (SI) sh_b[i]) >> 16; } Without this patch, it use widen_mult like below: vect__1.5_19 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.18_24 * 1]; vect__3.8_14 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.18_24 * 1]; vect_patt_22.9_13 = WIDEN_MULT_LO_EXPR <vect__3.8_14, vect__1.5_19>; vect_patt_22.9_9 = WIDEN_MULT_HI_EXPR <vect__3.8_14, vect__1.5_19>; vect__6.10_25 = vect_patt_22.9_13 >> 16; vect__6.10_26 = vect_patt_22.9_9 >> 16; vect__7.11_27 = VEC_PACK_TRUNC_EXPR <vect__6.10_25, vect__6.10_26>; MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.18_24 * 1] = vect__7.11_27; .L2: lxvx 33,7,9 lxvx 32,8,9 vmulosh 13,0,1 // widen mult vmulesh 0,0,1 xxmrglw 33,32,45 // merge xxmrghw 32,32,45 vsraw 1,1,12 // shift vsraw 0,0,12 vpkuwum 0,0,1 // pack stxvx 32,10,9 addi 9,9,16 bdnz .L2 With this patch, it ends up with: vect__1.5_14 = MEM <vector(8) short int> [(short int *)&sh_a + ivtmp.17_24 * 1]; vect__3.8_8 = MEM <vector(8) short int> [(short int *)&sh_b + ivtmp.17_24 * 1]; vect_patt_21.9_25 = vect__3.8_8 h* vect__1.5_14; MEM <vector(8) short int> [(short int *)&sh_c + ivtmp.17_24 * 1] = vect_patt_21.9_25; .L2: lxvx 32,8,9 lxvx 33,10,9 vmulosh 13,0,1 // widen mult vmulesh 0,0,1 vperm 0,0,13,12 // perm on widen mults stxvx 32,7,9 addi 9,9,16 bdnz .L2 > That is, what about adding optab internal functions > for [us]mul_highpart instead, much like the existing > ones for MULH{R,}S? > OK, I was thinking the IFN way at the beginning, but was worried that it's easy to be blamed saying it's not necessary since there is one existing tree_code. :-) Will update it with IFN way. BR, Kewen > Richard. > >> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9, >> x86_64-redhat-linux and aarch64-linux-gnu. >> >> BR, >> Kewen >> ----- >> gcc/ChangeLog: >> >> * tree-vect-patterns.c (vect_recog_mulhs_pattern): Add support to >> recog normal multiply highpart. >>