vinx13 opened a new pull request, #16830: URL: https://github.com/apache/tvm/pull/16830
This PR makes storage among different functions shared after cuda graph rewriting. Because CUDA graph cache storage, storage objects are not freed after function execution, this will increase memory usage if there are multiple functions. Making storage objects shared eliminate such overhead. It also updates rewriting to prevent capturing storages and bindings used as function output. Previous we relies on the fact output tensors are allocated with `R.builtin.alloc_tensor`, however, this behavior changed after we enable storage planning for output tensor, which may also use `R.memory.alloc_memory` cc @tqchen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
