tqchen commented on issue #5307: [TIR] Make lower_warp_memory support extent(threadIdx.x) < warp_size URL: https://github.com/apache/incubator-tvm/pull/5307#issuecomment-612554853 Can we still translate to `__shfl(x, (threadIdx.x + 1) % 4, 4)` in the alternative approach? given that the pattern` (wi= warp_index) ` directly corrresponds to the related pattern I see, in this case, it would be great to give a bit more discussion and thought about the new programming model. - Does that mean we are using a smaller "virtual warp"?(which brings the restriction of not being able to use larger amount of the shuffle)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
