vvchernov opened a new pull request, #11358: URL: https://github.com/apache/tvm/pull/11358
This is a draft of implementation of 'set_output_zero_copy' method on VirtualMachine side. Brief description of approach. 1. There is python API function 'set_output' which save external outputs in VM outputs_ field (map) for specified func name. It looks like 'set_input' method. 2. During 'invoke' outputs_ are saved in register_file. For this the register index of result is found from code_ field. 3. Due to we already have pre-allocated tensor(s) in register, memory allocation for the register index is avoided. Just now it has been done for AllocTensor and AllocADT ops (I observed in tests for different models that they are used for result tensors). May be other Alloc ops also should be updated. Notes: 1. I'm not sure that it works for many frames. Practically it looks like we need code_ not frame and any number of frames does not change ops stack (code_). Other thing I observed that code_ does not depend on func_name, may be it should it is not threadsafe just now. 2. It was implemented for CPU, I plan to check GPU specifics. 3. It seems that tensor(s) is(are) allocated over storage with prepared memory. It means that skipped AllocTensor and AllocADT can keep memory in RAM, it is not good thing but it generates questions about scenarios and VM flexibility. Hello @altanh and @mbs-octoml! Could you see the draft? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
