tqchen opened a new pull request, #19610: URL: https://github.com/apache/tvm/pull/19610
## Summary A PrimFunc with multiple sibling `thread_extent` blocks (multiple kernel launches in one function) hits a scoping bug in `MergeSharedMemoryAllocations`: the merged buffer is allocated only inside the first `thread_extent` body, but later `thread_extent`s' accesses are still rewritten to reference it. When `SplitHostDevice` partitions those `thread_extent`s into separate device functions, the second device function references an undefined var — leading to a codegen ICE or runtime crash. This bug is exposed by PR #19605 which moved the pass earlier in the pipeline (before `SplitHostDevice`). - Convert every per-launch singleton field (`merged_buf_var_`, `merged_alloc_size_`, `shmem_allocs_`, `buffer_byte_offsets_`, etc.) in `SharedMemoryRewriter` into a `KernelScope` struct held on a `scope_stack_`. Push a fresh scope on outermost `thread_extent` entry, run liveness/plan/offset-compute/rewrite/wrap inside that scope, pop on exit. - Each kernel launch ends up with its own merged buffer allocated inside its own subtree, preserving `LowerDeviceKernelLaunch`'s "at most one dyn-shmem `AllocBuffer` per kernel" invariant. - Adds `test_multi_thread_extent_blocks` to exercise two sibling `thread_extent` blocks each with independent `shared.dyn` allocations, including an end-to-end check through `AnnotateDeviceRegions` + `SplitHostDevice`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
