Re: [PR] [BugFix][Relax] Fix scatter_elements and scatter_nd CUDA compilation [tvm]

via GitHub Sun, 03 May 2026 05:20:46 -0700


tlopex commented on PR #19497:
URL: https://github.com/apache/tvm/pull/19497#issuecomment-4366154224


   Thanks for working on this. I agree that fixing this in the scatter lowering 
is the right layer, and this is much closer to what I had in mind than a 
generic `T.parallel` cleanup pass.
   
   That said, I’d prefer the fix to restore a proper target-specific TOPI 
dispatch rather than make the generic `topi.scatter_elements` / 
`topi.scatter_nd` implementation target-aware internally.
   
   The root issue in #19451 is that Relax legalization lost the old CUDA 
scatter lowering path: the op is lowered through `te.extern`, and the generic 
IRBuilder emits bare `T.parallel`, which is not valid GPU TIR. Since this is a 
target-specific legality issue, I think the more fundamental fix is to add a 
GPU TOPI implementation, e.g. under `topi/gpu/`, and dispatch to it from 
`relax/transform/legalize_ops/manipulate.py` when the current target is GPU. 
That is similar in spirit to the old Relay CUDA scatter path that was removed 
during the Relay cleanup.
   
   Even if the CPU and GPU algorithms are mostly the same, the generated TIR is 
not target-neutral: GPU needs explicit `blockIdx/threadIdx` bindings, and 
reductions may need GPU-specific handling. Keeping that in a GPU-specific 
implementation avoids making the generic TOPI path depend on 
`Target.current()`, and it keeps `VerifyMemory` useful as a check that GPU 
lowering produced valid GPU TIR in the first place.
   
   So my preference would be:
   1. keep the generic TOPI scatter lowering target-neutral for CPU/default use;
   2. add `topi/gpu` scatter lowering that emits thread-bound TIR;
   3. dispatch to the GPU implementation from Relax legalization;
   4. add CUDA compile/build regression tests for both `scatter_elements` and 
`scatter_nd`.
   
   I don’t think we need the broad pass from #19363, but I would like this PR 
to fix the issue through explicit GPU scatter lowering rather than inline 
target checks in the generic implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [BugFix][Relax] Fix scatter_elements and scatter_nd CUDA compilation [tvm]

Reply via email to