MasterJH5574 opened a new pull request, #15330:
URL: https://github.com/apache/tvm/pull/15330

   This PR fixes a bug of the previous decode-GeMV dlight scheduling.
   
   Previously, when the inner dimension of the largest tensor is spatial, in 
the end the fused epilogue block was not bound to any thread axis, which is 
wrong and will generate wrong GPU code with wrong numerical results. That is 
because after doing reverse-compute-at of the epilogue block, there are at 
lease one remaining spatial axis, and such axis is supposed to be bound to 
threadIdx.
   
   This PR fixes this issue, and add three test cases which can cover both the 
reduction-inner and spatial-inner cases with or without broadcasting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to