LeiWang1999 commented on PR #14329:
URL: https://github.com/apache/tvm/pull/14329#issuecomment-1475185873

   @andy-yang-1 yeah I noticed that there's a pass named ptx_ldg32, but it 
seems to me that this pass is not perfect yet? because ldg32 only load 4bytes 
from global, but sometimes we need ldg64 and ldg128 for more efficient data 
load and store in GPU. For the second consideration, I don't think this is 
something that this pr should take into account, partly because support for 
asynchronous copy is already there in current tensor ir, and partly because 
it's not a pass that is enabled by default, so users need to manually annotate 
and enable it and in python interface, it will be handled more comfortable I 
think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to