LudovicoYIN opened a new pull request, #18671:
URL: https://github.com/apache/tvm/pull/18671

   ### Motivation
   InjectPTXLDG32 rewrites BufferStore when encountering if_then_else, but it 
only
   initializes temporary buffers when an Allocate node exists. For functions 
without
   Allocate, this leads to uninitialized buffers and a hard segfault during 
compilation.
   In addition, the PTX-only pass can run on CPU/LLVM targets when 
tir.ptx_ldg32=1,
   injecting PTX intrinsics that are invalid for non-CUDA codegen.
   
   This PR ensures temporary buffers are created even when no Allocate exists, 
and
   skips InjectPTXLDG32 on non-CUDA targets, preventing segfaults and invalid 
PTX
   intrinsics on CPU.
   
   ### Changes
   - Ensure temp buffers are created when the rewrite path is taken without 
Allocate
   - Insert allocations at the function level when needed
   - Guard InjectPTXLDG32 so it only runs on CUDA targets
   - Add tests for CUDA (insertion) and CPU (skip) behavior
   
   ### Testing
   test_tir_transform_inject_ptx_ldg32.py
   
   ### Fixes
   - https://github.com/apache/tvm/issues/18612
   - https://github.com/apache/tvm/issues/18617
   - https://github.com/apache/tvm/issues/18599


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to