The following series attempts to solve the issue that 
SLP_TREE_SCALAR_STMTS is not full scalar coverage of the SLP graph,
in particular, but not only, when patterns and in particular SLP
patterns are involved.  This results in some workarounds in live
lane analysis, double-costing there and imprecision in scalar costing.

Instead of trying to derive scalar coverage from SLP_TREE_SCALAR_STMTS
the following basically re-does a simple "single-lane" SLP discovery
on the SSA graph from the scalar SLP graph entry stmts with external
SLP nodes determining the leafs.  To record coverage the series
turns STMT_SLP_TYPE which now is only pure_slp or no_vect into
marking original scalar stmts (not pattern stmts) that are covered
(now marked pure_slp with this patch).

I've introduced a 'slp_oprnds' class as a start to marshall GIMPLE
stmt operands <-> SLP node children mapping with the idea to re-use
this for an actual single-lane SLP graph build for loop vectorization,
both to ease root discovery there and to serve as starting point for
the longer-term alternate SLP discovery (merging nodes from a single SLP
graph rather than greedy discovery).  That class is likely going to
change as that evolves.

The series first changes STMT_SLP_TYPE to be scalar coverage for BB
vectorization (it's actually unused for loop vectorization).  Then
it simplifies BB live statement marking using it.  Then it replaces
the scalar coverage code in BB vectorization costing, actually
solving PR124222.  And finally (somewhat unrelated), it improves
BB vectorization live lane generation by no longer requiring to
be able to code generate from every SLP use of the live scalar stmt
but from one, only cost one, and only code-generate from that exactly
one.  This is not yet able to solve the fallbacks in actual code
generation - I have updated the comments to mention the actual testcases
FAILing.  We're still missing to commit to a schedule (aka record a
gsi on each SLP node where we insert vectorized stmts) that we could
use to upfront verify the inserted vector stmts reach all original
scalar uses (or in turn, make sure the schedule is arranged to allow 
that).

The series was part-wise and fully bootstrapped and tested on
x86_64-unknown-linux-gnu and is now queued for pushing when
stage1 opens.

Feedback still welcome of course.

Thanks,
Richard.

Reply via email to