This is an automated email from the ASF dual-hosted git repository.

bohan pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new 717c822a80 [FIX] Fix cumsum kernel sblock_alloc_buffer for non-sblock 
buffer (#18887)
717c822a80 is described below

commit 717c822a800c9a8c699f89e01889da5f74ddcd72
Author: Tianqi Chen <[email protected]>
AuthorDate: Sat Mar 7 13:44:54 2026 -0500

    [FIX] Fix cumsum kernel sblock_alloc_buffer for non-sblock buffer (#18887)
    
    ## Summary
    
    - Fix `gpu_2d_continuous_cumsum` using `T.sblock_alloc_buffer` for `Tmp`
    buffer that is used across multiple kernel launches (not within a single
    sblock). Changed to `T.alloc_buffer`.
    - `T.sblock_alloc_buffer` places the buffer in SBlock metadata, making
    subsequent references to buffer dimensions (used by `ceil_log2`)
    undefined after the AllocBuffer/DeclBuffer refactor.
    
    Fixes #18885
---
 python/tvm/relax/backend/gpu_generic/cumsum.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/tvm/relax/backend/gpu_generic/cumsum.py 
b/python/tvm/relax/backend/gpu_generic/cumsum.py
index 354171fcb1..ae2060175c 100644
--- a/python/tvm/relax/backend/gpu_generic/cumsum.py
+++ b/python/tvm/relax/backend/gpu_generic/cumsum.py
@@ -158,7 +158,7 @@ def gpu_2d_continuous_cumsum(
         m, n = T.int64(), T.int64()
         A = T.match_buffer(var_a, [m, n], dtype=in_dtype)
         Out = T.match_buffer(var_out, [m, n], dtype=out_dtype)
-        Tmp = T.sblock_alloc_buffer([m, n], dtype=out_dtype)
+        Tmp = T.alloc_buffer([m, n], dtype=out_dtype)
         ceil_log2 = T.Cast("int64", T.ceil(T.log2(T.Cast("float32", n))))
         total_rounds = ceil_log2 // LOG_BLOCK_N
 

Reply via email to