LeiWang1999 commented on PR #14329: URL: https://github.com/apache/tvm/pull/14329#issuecomment-1475185873
@andy-yang-1 yeah I noticed that there's a pass named ptx_ldg32, but it seems to me that this pass is not perfect yet? because ldg32 only load 4bytes from global, but sometimes we need ldg64 and ldg128 for more efficient data load and store in GPU. For the second consideration, I don't think this is something that this pr should take into account, partly because support for asynchronous copy is already there in current tensor ir, and partly because it's not a pass that is enabled by default, so users need to manually annotate and enable it and in python interface, it will be handled more comfortable I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
