vinx13 opened a new pull request, #16830:
URL: https://github.com/apache/tvm/pull/16830

   This PR makes storage among different functions shared after cuda graph 
rewriting. Because CUDA graph cache storage, storage objects are not freed 
after function execution, this will increase memory usage if there are multiple 
functions. Making storage objects shared eliminate such overhead.
   
   It also updates rewriting to prevent capturing storages and bindings used as 
function output. Previous we relies on the fact output tensors are allocated 
with `R.builtin.alloc_tensor`, however, this behavior changed after we enable 
storage planning for output tensor, which may also use `R.memory.alloc_memory`
   
   cc @tqchen 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to