karolherbst wrote:

Okay, seems like the division is causing problems:

```diff
diff --git a/libclc/clc/lib/generic/math/clc_atan2.inc 
b/libclc/clc/lib/generic/math/clc_atan2.inc
index f8e7c9638180..99929acf1e80 100644
--- a/libclc/clc/lib/generic/math/clc_atan2.inc
+++ b/libclc/clc/lib/generic/math/clc_atan2.inc
@@ -22,7 +22,8 @@ _CLC_OVERLOAD _CLC_CONST _CLC_DEF __CLC_FLOATN 
__clc_atan2(__CLC_FLOATN y,
   __CLC_FLOATN v = __clc_fmin(ax, ay);
   __CLC_FLOATN u = __clc_fmax(ax, ay);
 
-  __CLC_FLOATN vbyu = v / u;
+  __CLC_FLOATN s = u > 0x1.0p+96f ? 0x1.0p-32f : 1.0f;
+  __CLC_FLOATN vbyu = s * (v / (s * u));
 
   __CLC_FLOATN a = __clc_atan_reduced(vbyu);
 
```

Which was done in the old version fixing the issue for me. Please be aware, 
that many drivers implement division as `(1.0 / b) / a`, because the hardware 
lacks a native `fdiv`, with some scaling to account for big/small numbers (like 
nvidia and rusticl do). And divisions in CLC only have a guaranteed precision 
of 2.5 ULP (FULL_PROFILE) or 3.0 ULP (EMBEDDED_PROFILE) anyway.

I'm seeing that in my case divide has a max ULP of 2.0, but it could also run 
into denorms given the big input, which the scaling _should_ prevent tho...

https://github.com/llvm/llvm-project/pull/188706
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to