[Bug rtl-optimization/89445] [9 regression] _mm512_maskz_loadu_pd "forgets" to use the mask

jakub at gcc dot gnu.org Fri, 22 Feb 2019 08:41:57 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89445


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-02-22
                 CC|                            |jakub at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Indeed.  Before that change we have:
Trying 36, 47 -> 51:
   36: r117:V8DF=vec_merge([r107:DI+r103:DI],const_vector,r90:HI#0)
   47: r124:V8DF={r117:V8DF*r94:V8DF+r121:V8DF}
      REG_DEAD r121:V8DF
      REG_DEAD r117:V8DF
   51: [r100:DI]=vec_merge(r124:V8DF,[r100:DI],r90:HI#0)
      REG_DEAD r100:DI
      REG_DEAD r124:V8DF
Failed to match this instruction:
(set (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
    (vec_merge:V8DF (fma:V8DF (vec_merge:V8DF (mem:V8DF (plus:DI (reg/v/f:DI
107 [ x ])
                        (reg/v:DI 103 [ i ])) [0  S64 A8])
                (const_vector:V8DF [
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                        (const_double:DF 0.0 [0x0.0p+0])
                    ])
                (subreg:QI (reg/v:HI 90 [ mask ]) 0))
            (reg:V8DF 94 [ _18 ])
            (reg:V8DF 121))
        (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
        (subreg:QI (reg/v:HI 90 [ mask ]) 0)))
With the change:
Trying 36, 47 -> 51:
   36: r117:V8DF=vec_merge([r107:DI+r103:DI],const_vector,r90:HI#0)
   47: r124:V8DF={r117:V8DF*r94:V8DF+r121:V8DF}
      REG_DEAD r121:V8DF
      REG_DEAD r117:V8DF
   51: [r100:DI]=vec_merge(r124:V8DF,[r100:DI],r90:HI#0)
      REG_DEAD r100:DI
      REG_DEAD r124:V8DF
Failed to match this instruction:
(set (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
    (vec_merge:V8DF (fma:V8DF (mem:V8DF (plus:DI (reg/v/f:DI 107 [ x ])
                    (reg/v:DI 103 [ i ])) [0  S64 A8])
            (reg:V8DF 94 [ _18 ])
            (reg:V8DF 121))
        (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
        (subreg:QI (reg/v:HI 90 [ mask ]) 0)))
Successfully matched this instruction:
(set (reg:V8DF 124)
    (fma:V8DF (mem:V8DF (plus:DI (reg/v/f:DI 107 [ x ])
                (reg/v:DI 103 [ i ])) [0  S64 A8])
        (reg:V8DF 94 [ _18 ])
        (reg:V8DF 121)))
Successfully matched this instruction:
(set (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
    (vec_merge:V8DF (reg:V8DF 124)
        (mem:V8DF (reg/f:DI 100 [ _30 ]) [0  S64 A8])
        (subreg:QI (reg/v:HI 90 [ mask ]) 0)))

Something like simplify_merge_mask can be done only if there are MEMs involved
in the operand (or guaranteed not to trap through MEM_NOTRAP_P) and if all the
operations don't have trap states or similar issues (so no floating point ops
nor division by zero, anything else?).
In theory, if the second argument in both inner and outer VEC_MERGE is
CONST_VECTOR and we could prove that feeding that constant into the operation
would result in that same value always, we could optimize away the outer
VEC_MERGE rather than inner, but I guess with floating point ops and signed
zero etc. even that might be hard.

So, shall we just revert that commit until it is fixed, or is there an easy way
to avoid doing it in the problematic cases?

[Bug rtl-optimization/89445] [9 regression] _mm512_maskz_loadu_pd "forgets" to use the mask

Reply via email to