clip do not propagate NaN (IEEE 754 violation) [tvm]

via GitHub Sat, 16 May 2026 23:46:15 -0700


wuyii8941 opened a new issue, #19579:
URL: https://github.com/apache/tvm/issues/19579


   
   ## Expected behavior
   
   `maximum(NaN, x)` should return `NaN` per IEEE 754-2019 §9.6, consistent 
with NumPy, PyTorch, JAX, and ONNX Runtime.
   
   `relu(NaN)` should return `NaN` (since relu = max(x, 0)).
   
   ## Actual behavior
   
   When NaN is the **first** operand of `T.max` / `T.min`, the result is the 
second operand instead of NaN. This affects `R.maximum`, `R.minimum`, 
`R.nn.relu`, and `R.clip`.
   
   The root cause is that `T.max(a, b)` compiles to x86 `maxss`/`maxps` 
instructions, which have the hardware behavior: "if **src1** is NaN, return 
**src2**". IEEE 754 requires returning NaN when either operand is NaN.
   
   ## Reproducer
   
   ```python
   import numpy as np
   import tvm
   from tvm import relax
   import tvm.relax.op as R
   from tvm.relax.transform import LegalizeOps
   
   bb = relax.BlockBuilder()
   a = relax.Var("a", relax.TensorStructInfo((4,), "float32"))
   b = relax.Var("b", relax.TensorStructInfo((4,), "float32"))
   with bb.function("main", [a, b]):
       with bb.dataflow():
           gv = bb.emit_output(bb.emit(R.maximum(a, b)))
       bb.emit_func_output(gv)
   mod = bb.finalize()
   
   pipeline = tvm.ir.transform.Sequential([LegalizeOps()])
   exe = tvm.relax.build(pipeline(mod), target="llvm")
   vm = tvm.relax.VirtualMachine(exe, device=tvm.cpu())
   
   A = np.array([np.nan, 1.0, np.nan, 0.0], np.float32)
   B = np.array([1.0, np.nan, np.nan, np.nan], np.float32)
   out = vm["main"](
       tvm.runtime.tensor(A, device=tvm.cpu()),
       tvm.runtime.tensor(B, device=tvm.cpu()),
   ).numpy()
   
   print(out)       # [1.  nan  nan  nan]  — element 0 is WRONG
   print(np.maximum(A, B))  # [nan nan nan nan]  — all NaN per IEEE 754
   ```
   
   The pattern is operand-order-dependent:
   | Expression | TVM | Expected (IEEE 754) |
   |---|---|---|
   | `max(NaN, 1.0)` | `1.0` | `NaN` |
   | `max(1.0, NaN)` | `NaN` | `NaN` |
   | `relu(NaN)` = `max(NaN, 0)` | `0.0` | `NaN` |
   | `clip(NaN, -1, 1)` | `1.0` | `NaN` |
   
   ## Affected operations
   
   ```python
   R.maximum(a, b)    # when a is NaN
   R.minimum(a, b)    # when a is NaN
   R.nn.relu(x)       # when x is NaN → returns 0
   R.clip(x, lo, hi)  # when x is NaN → returns hi
   ```
   
   Not affected (correct NaN propagation):
   - `R.add`, `R.multiply`, `R.subtract`, `R.divide` — arithmetic propagates 
NaN correctly
   - `R.nn.leakyrelu` — uses comparison path, NaN propagates through multiply
   - `R.nn.silu`, `R.nn.gelu` — sigmoid/erf path propagates NaN
   
   ## Why this matters
   
   `relu` is the most common activation function. When an upstream computation 
produces NaN (e.g., from overflow or division by zero), the NaN should 
propagate to signal the error. Instead, TVM's `relu` silently converts NaN to 
0, making the error invisible:
   
   ```python
   # Suppose upstream overflow produces NaN in one element:
   x = [[1.0, 2.0, NaN, 4.0]]
   relu(x).sum()
   # TVM:   7.0   ← NaN silently disappeared
   # NumPy: NaN   ← correctly signals the problem
   ```
   
   This can cause silent wrong results in production models, where NaN 
detection is a standard debugging/monitoring signal.
   
   ## Root cause
   
   In the lowered TIR, `maximum` becomes `T.max(a, b)`, which LLVM lowers to 
x86 `maxss`/`maxps`. These instructions follow "if src1 is NaN, return src2" 
semantics rather than IEEE 754 "return NaN if either is NaN".
   
   The fix would be to emit NaN-aware max/min, e.g.:
   ```
   select(isnan(a) | isnan(b), NaN, max(a, b))
   ```
   
   ## Environment
   
   - TVM commit: 0b0afd8dd (main, 2026-04-24)
   - OS: Ubuntu 20.04
   - Target: llvm (CPU, x86-64)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Bug] maximum/minimum/relu/clip do not propagate NaN (IEEE 754 violation) [tvm]

Reply via email to