vvchernov opened a new pull request, #11003: URL: https://github.com/apache/tvm/pull/11003
I observed that VirtualMachine::SetInputTensorWithIndex(...) method has discrepancy between description (also see description for VirtualMachine::SetInput(...) which assumes zero copy if possible and uses the method) and implementation. It always create new NDArray and copies data to it if source input is DLTensor even if devices are the same. It reduces performance of multiple input models due to excess copying. The PR fixes this issue. Note: I have a remark about current design. VirtualMachine has only `set_input` python method, the same method is used inside `run` and `invoke` methods with input args. But there is no `set_input_zero_copy`. In description I obsrved that `set_input` tries to not use copying if possible. Theoretically we can have problem if `set_input` is used, input tensors are released after that and when `run` or `invoke` are launched. As I know GraphExecutor does not have such problem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
