tqchen opened a new pull request, #19610:
URL: https://github.com/apache/tvm/pull/19610

   ## Summary
   
   A PrimFunc with multiple sibling `thread_extent` blocks (multiple kernel 
launches in one function) hits a scoping bug in `MergeSharedMemoryAllocations`: 
the merged buffer is allocated only inside the first `thread_extent` body, but 
later `thread_extent`s' accesses are still rewritten to reference it. When 
`SplitHostDevice` partitions those `thread_extent`s into separate device 
functions, the second device function references an undefined var — leading to 
a codegen ICE or runtime crash. This bug is exposed by PR #19605 which moved 
the pass earlier in the pipeline (before `SplitHostDevice`).
   
   - Convert every per-launch singleton field (`merged_buf_var_`, 
`merged_alloc_size_`, `shmem_allocs_`, `buffer_byte_offsets_`, etc.) in 
`SharedMemoryRewriter` into a `KernelScope` struct held on a `scope_stack_`. 
Push a fresh scope on outermost `thread_extent` entry, run 
liveness/plan/offset-compute/rewrite/wrap inside that scope, pop on exit.
   - Each kernel launch ends up with its own merged buffer allocated inside its 
own subtree, preserving `LowerDeviceKernelLaunch`'s "at most one dyn-shmem 
`AllocBuffer` per kernel" invariant.
   - Adds `test_multi_thread_extent_blocks` to exercise two sibling 
`thread_extent` blocks each with independent `shared.dyn` allocations, 
including an end-to-end check through `AnnotateDeviceRegions` + 
`SplitHostDevice`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to