MasterJH5574 opened a new pull request, #15330: URL: https://github.com/apache/tvm/pull/15330
This PR fixes a bug of the previous decode-GeMV dlight scheduling. Previously, when the inner dimension of the largest tensor is spatial, in the end the fused epilogue block was not bound to any thread axis, which is wrong and will generate wrong GPU code with wrong numerical results. That is because after doing reverse-compute-at of the epilogue block, there are at lease one remaining spatial axis, and such axis is supposed to be bound to threadIdx. This PR fixes this issue, and add three test cases which can cover both the reduction-inner and spatial-inner cases with or without broadcasting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
