https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463

--- Comment #35 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #34)
> Possibly first computing a lattice val for each SSA name whether its origin
> is a "real" or a "imag" component of a complex load could get us meta but
> even then the individual sorting which determines the initial association to
> SLP nodes would be only possible to adjust "cross-lane" (and to what?  I
> guess combine real+imag parts?  Hopefully of the same entity).  Into vect we
> get with
> 
>   _19 = REALPART_EXPR <*_3>;
>   _18 = IMAGPART_EXPR <*_3>;
>   _5 = a_14(D) + _2;
>   _23 = REALPART_EXPR <*_5>; // real
>   _24 = IMAGPART_EXPR <*_5>; // imag
>   _26 = b$real_11 * _23; // real?
>   _27 = _24 * _53; // imag?
>   _28 = _23 * _53;  // mixed?  but fed into imag
>   _29 = b$real_11 * _24; // mixed?
>   _7 = _18 - _28;  // mixed? or imag?
>   _22 = _27 - _26;  // mixed?
>   _32 = _19 + _22;  // mixed?  or real?
>   _33 = _7 - _29; // mixed?  but fed into real?
>   REALPART_EXPR <*_3> = _32;
>   IMAGPART_EXPR <*_3> = _33;
> 
> so not sure if that will help.  That we'd like to have full load groups
> is unfortunately only visible a node deeper.  We could also fill a lattice
> with group IDs but we'd need to know parts to identify lane duplicates vs.
> full groups.  It's also a lot of hassle for not much gain and very special
> cases?

That should help, because all it's after is that the final loads be permuted.
The reason I'm keep to fix this is because it's not that niche. complex-add due
to the operation being just +/- with a permute is by far the most common one.

Not recognizing this is e.g. 10% on fotonik in SPECCPU2017 FP, which is also a
regression I'm trying to fix.

I can try to reduce a testcase for that to see if maybe that specific one is
easier to fix.  I'm just wondering if we can't do better in the future, e.g.
LLVM recognizes both fms180snd cases for instance.

If it's easier, I could see if we can just have another pattern to discover
fmul + fcadd?

Could maybe work and fix the SPEC regression... need to make a testcase

Reply via email to