MasterJH5574 commented on pull request #10207: URL: https://github.com/apache/tvm/pull/10207#issuecomment-1035982910
Interesting. Looks like the perf improvement isn't very much? Only when `n = 4` the shuffle-down implementation is better than the shared memory implementation 🤔 > Another thing worth noting is, we can only allow cross warp reduction by shuffle-down, thus warp size must be a multiple of blockDim.x when blockDim.y * blockDim.z != 1. BTW do we have this requirement in the codebase now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
