vvchernov opened a new pull request, #11003:
URL: https://github.com/apache/tvm/pull/11003

   I observed that VirtualMachine::SetInputTensorWithIndex(...) method has 
discrepancy between description (also see description for 
VirtualMachine::SetInput(...) which assumes zero copy if possible and uses the 
method) and implementation. It always create new NDArray and copies data to it 
if source input is DLTensor even if devices are the same. It reduces 
performance of multiple input models due to excess copying. The PR fixes this 
issue.
   
   Note: I have a remark about current design. VirtualMachine has only 
`set_input` python method, the same method is used inside `run` and `invoke` 
methods with input args. But there is no `set_input_zero_copy`. In description 
I obsrved that `set_input` tries to not use copying if possible. Theoretically 
we can have problem if `set_input` is used, input tensors are released after 
that and when `run` or `invoke` are launched. As I know GraphExecutor does not 
have such problem.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to