apeskov commented on PR #11642:
URL: https://github.com/apache/tvm/pull/11642#issuecomment-1157461403

   Hi @yangulei,
   
   As I remember zero-copy input/output of tensors should already works in TVM 
main. If you define relay layouts match with DNNL expectations it will use 
external buffer as is without any copying or processing. It was one of the goal 
of PR  https://github.com/apache/tvm/pull/11345. Could you please be more 
specific about scenarios you would like to optimise?
   
   Regarding `post-op sum`. You are absolutely right, non in-place op before 
`add` break correctness. Mem copy is inevitable. In case of post op sum input 
data should be put into DST tensor of DNNL primitive and execution of primitive 
will rewrite this data. In contrast of `post-op binary add` read data from 
separate input tensor. Currently `binary add` has a limited support by 
primitives which lead to `ref:any`. Also it has slightly worse performance 
because it lead to one more memory access pass.
   
   In case of missing layouts DNNL BOC runtime automatically inject required 
reorder primitives. And it will looks like next:
   
   ``` 
                       bias --
                              \
   in1 -> RORDER_1 -> tmp_1 -> CONV -> tmp3 -> RORDER_3 -> out
                                     /
                  in2 -> RORDER_2 --
   ```
   Problem in tensor tmp_3. There is 2 primate which produce data of tmp_3. 
That's break concept of data flow graph. `REORDER_2` should be executed 
strongly before `CONV` primitive. If you take a look in this patch in code 
relate with in-place simulation ([link to 
it](https://github.com/apeskov/tvm/blob/054901196b5c562f70208b0d9394d16e305e6269/src/runtime/contrib/dnnl/dnnl_json_runtime.cc#L771-L788))
 you will see exactly I said. Essentially it's just copy input data to dst 
tensor **exactly** before convolution primitive.
   
   `Post op sum` is very tricky and has a lot of requirements. It works only if 
proper layouts for `conv` and `add` was selected. It requires validation of 
ability to rewrite input tensor memory (for Resnet50  this is correct but in 
arbitrary case it should be checked).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to