https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123997

            Bug ID: 123997
           Summary: Missing patterns for masked vector multiplication with
                    memory operand
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Trying 48 -> 49:
   48: r134:V8DF=vec_merge(unspec[[r129:DI*0x8+r121:DI]]
178,const_vector,r118:QI)   
      REG_DEAD r129:DI
      REG_DEAD r121:DI
   49: r133:V8DF=vec_merge(r110:V8DF*r134:V8DF,const_vector,r118:QI)
      REG_DEAD r134:V8DF
      REG_DEAD r110:V8DF
Failed to match this instruction:
(set (reg:V8DF 133)
    (vec_merge:V8DF (mult:V8DF (vec_merge:V8DF (unspec:V8DF [ 
                        (mem:V8DF (plus:DI (mult:DI (reg:DI 129 [ _66 ])
                                    (const_int 8 [0x8]))
                                (reg/v/f:DI 121 [ in1 ])) [1  S64 A64])
                    ] UNSPEC_MASKLOAD)
                (const_vector:V8DF [
                        (const_double:DF 0.0 [0x0.0p+0]) repeated x8
                    ])
                (reg:QI 118 [ _89 ]))
            (reg:V8DF 110 [ vect__23.21 ]))
        (const_vector:V8DF [
                (const_double:DF 0.0 [0x0.0p+0]) repeated x8
            ])
        (reg:QI 118 [ _89 ])))

or, when with -Ofast you get originally unmasked multiplication but masked
load:

Trying 48 -> 49:
   48: r133:V8DF=vec_merge(unspec[[r129:DI*0x8+r121:DI]]
178,const_vector,r118:QI)
      REG_DEAD r129:DI
      REG_DEAD r121:DI
   49: r134:V8DF=r133:V8DF*r110:V8DF
      REG_DEAD r133:V8DF
      REG_DEAD r110:V8DF
Failed to match this instruction:
(set (reg:V8DF 134 [ vect__9.25_78 ])
    (mult:V8DF (vec_merge:V8DF (unspec:V8DF [
                    (mem:V8DF (plus:DI (mult:DI (reg:DI 129 [ _66 ])
                                (const_int 8 [0x8]))
                            (reg/v/f:DI 121 [ in1 ])) [1  S64 A64])
                ] UNSPEC_MASKLOAD)
            (const_vector:V8DF [
                    (const_double:DF 0.0 [0x0.0p+0]) repeated x8
                ])
            (reg:QI 118 [ _89 ]))
        (reg:V8DF 110 [ vect__23.21 ])))


Testcase, compile with -O{3,fast} -march=x86-64-v4 --param
vect-partial-vector-usage=1

void foo(double * restrict out,
        double *in0,
        double *in1,
         int N) {
    for ( int i = 0 ; i < N ; i++ ) {
        out[i] = in0[i] * in1[i];
    }
}

and you'll get masked epilogue assembly like

        subl    %eax, %ecx
        vpbroadcastd    %ecx, %ymm0
        vpcmpud $6, .LC0(%rip), %ymm0, %k1
        vmovupd (%r9,%rax,8), %zmm2{%k1}{z}
        vmovupd (%r8,%rax,8), %zmm1{%k1}{z}
        vmulpd  %zmm1, %zmm2, %zmm0{%k1}{z}   <----
        vmovupd %zmm0, (%rdi,%rax,8){%k1}

where the indicated multiplication could use a memory operand.  With -Ofast
the multiplication is instead

        vmovupd (%r9,%rax,8), %zmm1{%k1}{z}
        vmovupd (%r8,%rax,8), %zmm0{%k1}{z}
        vmulpd  %zmm1, %zmm0, %zmm0


I suspect quite some explosions in patterns if we want to handle this (and
other operations that can do fault suppression) memory forwarding via combine. 
Unsure if there's another, better, way to achieve such forwarding with some
md-reorg?

Reply via email to