[clang] [AMDGPU] Use generic builtins for `wave_reduce` ops (PR #179589)

Alex Voicu via cfe-commits Sun, 08 Feb 2026 15:47:16 -0800

AlexVlx wrote:

> > Why? Again, the algorithm itself is generic, you're just making life 
> > slightly harder for the user for no clear benefit, why would a HLL user 
> > care that the **intrinsic** that gets selected is reduce_fmin or 
> > reduce_fmax?
> 
> They have very different behavior. FP and integer operations are really not 
> related operations, not even algebraically. e.g., integer min is associative 
> and fmin is not.
>


Thank you for explaining, this is a very good point.

> For the min/max case, there's at least 3 different pairs of operations, with 
> an additional modifier bit for signed zero handling. It needs to be explicit 
> what the behavior is. It is quite important which nan behavior you are 
> getting. If you build these on top of minnum or minimumnum, any nan elements 
> will disappear (unlike every other operation). If it's minimum, you're going 
> to get worse codegen but any nan in any lane will give a nan result. On top 
> of that is the hopeless signaling nan problem, but that's besides the point.
>

All true, but I'm struggling to see how any of that is the addressed in any 
meaningful way today, unless I am missing something rather obvious. Explicitly 
suffixing these `_f##` doesn't help at all with describing NaN behaviour, or 
what the actual comparator used is. If you would be so kind as to have a glance 
at the first paragraph of my reply to @shiltian, or, more specifically, the 
final phrase there, I do suggest a solution which'd entail taking an extra 
argument that specifies FP behaviour. If anything, pointing out these aspects 
further reinforces the need to expose these in Clang as generic parametrisable 
interfaces, lest we end up in an even poorer state, with even more suffixed 
interfaces specifying e.g. the comparator, signed zero handling etc., and with 
no particularly reasonable immediately obvious default.
 
> If you're going to make this breaking change, I would strongly advise 
> eliminating the fmin/fmax name, and going with minnum and minimum
>
I am a bit confused. fmin and fmax apply to the intrinsics, not to the builtins 
being introduced. Given the point above I suspect you envision reworking the 
existing intrinsics themselves, but that'd be a separate piece of work.

> Also the title should still be fixed to state what this is doing, and munging 
> the current target builtins into type generic builtins not generic in the 
> typical target sense
>
I'm not sure I follow what generic in the typical target sense is meant to 
mean, but generic itself is an overloaded term and we might be relying on 
different sources for our nomenclature. Whilst I would object to naming this 
"the great munging", I'm sure I can reword it in a less open to interpretation 
way, thank you for the suggestion.



https://github.com/llvm/llvm-project/pull/179589
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Use generic builtins for `wave_reduce` ops (PR #179589)

Reply via email to