vvchernov opened a new pull request, #11358:
URL: https://github.com/apache/tvm/pull/11358

   This is a draft of implementation of 'set_output_zero_copy' method on 
VirtualMachine side.
   
   Brief description of approach.
   1. There is python API function 'set_output' which save external outputs in 
VM outputs_ field (map) for specified func name. It looks like 'set_input' 
method.
   2. During 'invoke' outputs_ are saved in register_file. For this the 
register index of result is found from code_ field.
   3. Due to we already have pre-allocated tensor(s) in register, memory 
allocation for the register index is avoided. Just now it has been done for 
AllocTensor and AllocADT ops (I observed in tests for different models that 
they are used for result tensors). May be other Alloc ops also should be 
updated.
   
   Notes: 1. I'm not sure that it works for many frames. Practically it looks 
like we need code_ not frame and any number of frames does not change ops stack 
(code_). Other thing I observed that code_ does not depend on func_name, may be 
it should it is not threadsafe just now.
   2. It was implemented for CPU, I plan to check GPU specifics.
   3. It seems that tensor(s) is(are) allocated over storage with prepared 
memory. It means that skipped AllocTensor and AllocADT can keep memory in RAM, 
it is not good thing but it generates questions about scenarios and VM 
flexibility.
   
   Hello @altanh and @mbs-octoml! Could you see the draft?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to