Tamar Christina <[email protected]> writes:
>> -----Original Message-----
>> From: Richard Sandiford <[email protected]>
>> Sent: 06 May 2026 17:01
>> To: Richard Biener <[email protected]>
>> Cc: [email protected]; Tamar Christina <[email protected]>;
>> [email protected]
>> Subject: Re: [PATCH 2/2] [x86] adjust OMP SIMD call cost
>> 
>> Richard Biener <[email protected]> writes:
>> > The following adds special handling to OMP SIMD vector call costs
>> > which were not costed at all and for which a single simple vector
>> > stmt isn't appropriate.  PR125174 shows that even when AVX imposes
>> > more overhead (from also slightly bogus costing) than SSE, when
>> > there's two OMP SIMD calls involved doing less of those should trump
>> > that.
>> >
>> > Bootstrap & regtest ongoing on x86_64-unknown-linux-gnu.
>> >
>> > I've verified this resolves the observed 465.tonto regression.  I
>> > thought about catching all OMP SIMD vectorized stmts but then
>> > realized scalar costing doesn't see this yet so we'll make all
>> > vectorizations unprofitable.  We cannot handle all calls this
>> > way either, as some directly expand to native insns (popcount, etc.).
>> > So I fear we have to maintain a positive list.  There's 52
>> > 'notinbranch' SIMD declatations in glibc 2.38 on x86_64, probably
>> > different ones on ARM.
>> >
>> > Also we of course have no idea about actual cost of the call
>> > (but it's expensive).  Nor do we have an idea of the scalar
>> > vs. vector cost.
>> >
>> > But as the PR shows, doing "nothing" isn't an option, at least
>> > when, like on x86 there's both SSE and AVX variants and the
>> > surrounding code would make the SSE variant (appear) cheaper.
>> >
>> > Any good ideas?
>> >
>> > Otherwise I'll try to extensively cover all libm builtins
>> > (anticipating future SIMD-ification) in the same way, with
>> > same costs.
>> 
>> Probably a daft question, but: if the assumption is that OMP SIMD calls
>> are very expensive (which I agree is reasonable!), then would there be
>> any important cases in which we'd want to pick an SSE loop with OMP SIMD
>> calls over an AVX loop with OMP SIMD calls?
>
> I think this is the same as the Adv.SIMD vs SVE example in my reply
> https://godbolt.org/z/cbGM5K8fn  
>
> SVE will always have an additional overhead because of the predicates.
> So when ncopies > 1 on same VL Adv.SIMD should always be faster.
>
> On different VLs it's more complicated...
>
>> 
>> I just wonder whether comparing "number of OMP SIMD CALLs / estimated
>> VF"
>> would get us most of the way there, falling back to the current cost
>> comparison when the ratios are equal.
>
> I guess the problem here is it also depends on the life time of the calls.
>
> In my examples above one of the reasons it's more expensive is because
> of how the calls are materialized.  I think the ncopies > 1 ones are
> more problematic over counting OMP SIMD CALLS because they
> artificially keep values live across the second call.
>
> i.e. in https://godbolt.org/z/cbGM5K8fn there's no reason for `z23` to
> be kept live because the OMP calls are marked pure and const (from
> what I remember) so couldn't affect memory anyway.

Yeah, it sounds like there are two aspects to it: the cost of the
scaffolding needed to make the call and the cost of the call(ee)
itself.  I agree that the cost of the scaffolding should be part
of the normal costing process and should be used to distinguish
Adv SIMD and like-sized SVE.  But it sounded like the x86 example
was more about the cost of the call(ee): if the OMP SIMD call is
assumed to do a lot of work, doing it twice for half-sized vectors
is a clear loss.  (Might have misunderstood though :) )

The problem with trying to cost the call(ee) based on (say) the
likely number of instructions is that it falls down when costing
user-defined functions.  I think we'd need a fallback approach
for that case, even if we can do better for well-known functions.

That's why I was thinking of having "number of OMP SIMD CALLs / estimated
VF" as a first-level comparison.  But maybe adding magic param
in the normal costing process is good enough in practice.  I suppose
either way is going to have corner cases that do the wrong thing...

Thanks,
Richard

Reply via email to