This is an automated email from the ASF dual-hosted git repository.
bohan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new 717c822a80 [FIX] Fix cumsum kernel sblock_alloc_buffer for non-sblock
buffer (#18887)
717c822a80 is described below
commit 717c822a800c9a8c699f89e01889da5f74ddcd72
Author: Tianqi Chen <[email protected]>
AuthorDate: Sat Mar 7 13:44:54 2026 -0500
[FIX] Fix cumsum kernel sblock_alloc_buffer for non-sblock buffer (#18887)
## Summary
- Fix `gpu_2d_continuous_cumsum` using `T.sblock_alloc_buffer` for `Tmp`
buffer that is used across multiple kernel launches (not within a single
sblock). Changed to `T.alloc_buffer`.
- `T.sblock_alloc_buffer` places the buffer in SBlock metadata, making
subsequent references to buffer dimensions (used by `ceil_log2`)
undefined after the AllocBuffer/DeclBuffer refactor.
Fixes #18885
---
python/tvm/relax/backend/gpu_generic/cumsum.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/python/tvm/relax/backend/gpu_generic/cumsum.py
b/python/tvm/relax/backend/gpu_generic/cumsum.py
index 354171fcb1..ae2060175c 100644
--- a/python/tvm/relax/backend/gpu_generic/cumsum.py
+++ b/python/tvm/relax/backend/gpu_generic/cumsum.py
@@ -158,7 +158,7 @@ def gpu_2d_continuous_cumsum(
m, n = T.int64(), T.int64()
A = T.match_buffer(var_a, [m, n], dtype=in_dtype)
Out = T.match_buffer(var_out, [m, n], dtype=out_dtype)
- Tmp = T.sblock_alloc_buffer([m, n], dtype=out_dtype)
+ Tmp = T.alloc_buffer([m, n], dtype=out_dtype)
ceil_log2 = T.Cast("int64", T.ceil(T.log2(T.Cast("float32", n))))
total_rounds = ceil_log2 // LOG_BLOCK_N