https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121766

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
The original change happened because with the cost model disabled we started
costing inductions again and stopped costing truncations.

The not costing of truncation is just a missing feature, but I think the
reducer is too far reduced.

with -msve-vector-bits=128 the Adv. SIMD code handles 16 bytes per iteration
and uses one less store slot and the SVE code in the example 8.

Benchmarking that loop confirms it. The Adv. SIMD loops is much faster than the
SVE on every SVE core.

So while there is a costing gap wrt to the truncating stores, the codegen is
correct for the given example loop.

Is it perhaps reduced too much?

Reply via email to