arsenm wrote:

> @addmisol -- friendly heads-up: we're tracking a regression that bisects 
> cleanly to this PR.
> 
> In HIP code on `amdgcn` targets, FP16 wavefront reductions 
> (`__reduce_{min,max}_sync<__half>`) return wrong values after this change. 
> The root cause appears to be: HIP's `__half` (`struct { union { _Float16 } 
> }`) is now classified by `ABIArgInfo::getDirect()` as an aggregate, while 
> pre-built `rocm-device-libs` still exposes `__ockl_wfred_*_f16` with a flat 
> `half` parameter. The resulting call-site / callee signature mismatch blocks 
> inlining of a `convergent` wave-reduction helper, which then runs under a 
> divergent-call EXEC mask and reads stale data from neighbour lanes via DPP.
> 

Device libs should *only* be used with a build from the freshly built compiler. 
This is user error 

https://github.com/llvm/llvm-project/pull/185083
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to