On 25/11/2025 13:12, Richard Biener wrote:
On Mon, Nov 24, 2025 at 8:01 PM Christopher Bazley <[email protected]> wrote:

To decide whether to create a new SLP instance for BB SLP,
vect_analyze_slp_instance will need the minimum number of lanes
in the SLP tree, which must not be less than the group size
(otherwise "unrolling" is required). All usage of max_nunits
is therefore replaced with a new class that encapsulates
both minimum and maximum.

For now, the minimum value is unused.

Tracking minimum and maximum nunits on each SLP node is overkill
and the way we accumulate those to a global minimum/maximum on
an SLP graph entry is not suitable to compute "unrolling" or a split
point on the entry node group-size.  I'd like to see us do away with
tracking nunits at all.

In fact the maybe_ne (unrolling_factor, 1U) code in vect_analyze_slp_instance
and vect_build_slp_instance is currently unreachable due to the check
already present in vect_record_max_nunits.

The only other use in vect_update_slp_vf_for_node should be replaced by
the local SLP_TREE_VECTYPE, like with the attached patch.  I've put
this to do "later" for some time now because as long as we had both SLP
and non-SLP the vect_maybe_update_slp_op_vectype code is not
really taking advantage of such local decisions and it gets it "wrong" for
invariants for example in vectorizable_conversion, leading to ICEs
in gcc.dg/vect/O3-pr87546.c and gcc.dg/vect/O3-vect-pr32243.c.
This could be mitigated locally in vect_update_slp_vf_for_node,
but the real fix is to not require extra unrolling because of constant/invariant
nodes.

That said, I really want to get rid of max_nunits, not add to it.  Instead
for the purpose of splitting and validating vector type validity for BB
vectorization I'd resort to a walk over the graph like
vect_update_slp_vf_for_node
does, deciding on splitting and predication.  This change to track not only
maximum but minimum nunits goes in the wrong direction.  For the
purpose of this patch I suggest to compute what you need for the "minimum"
by a new SLP graph walk instead.

Thanks for explaining this.

I will drop my patch to track the minimum and maximum number of lanes from v5 of this patch series. In its place, I will substitute a new function that is invoked by vect_analyze_slp_instance to compute the minimum number of subparts for all of the vector types used in an SLP tree for which an SLP instance is about to be created.

--
Christopher Bazley
Staff Software Engineer, GNU Tools Team.
Arm Ltd, 110 Fulbourn Road, Cambridge, CB1 9NJ, UK.
http://www.arm.com/

Reply via email to