barry-jin commented on pull request #19685:
URL: https://github.com/apache/incubator-mxnet/pull/19685#issuecomment-780982802


   I have replaced the backend APIs(MXInvokeCachedOp, 
MXNET_REGISTER_GLOBAL("cached_op.invoke")) with simple or dummy implementation 
so that we can fully expose the overhead of the API call with/without this PR 
by removing the computational costs. The results is shown as follows: 
   
   For CachedOp invocation call without this PR, it takes around 7.22 us and 
the most of the overhead is in making cython/python args; For CachedOp 
invocation call with this PR, it takes around 4.041 us and the most of the 
overhead is in type translation/checking in packedfunc system. 
   
   CachedOp Invocation in cython code: 
   <img width="855" alt="Screen Shot 2021-02-17 at 5 42 22 PM" 
src="https://user-images.githubusercontent.com/69359374/108292283-a5e89800-7148-11eb-8088-55cfd0f7a6b0.png";>
   CachedOp Invocation with new FFI implementation(accelerated by cython): 
   <img width="992" alt="Screen Shot 2021-02-17 at 5 42 30 PM" 
src="https://user-images.githubusercontent.com/69359374/108292290-aa14b580-7148-11eb-9a95-4feec7fedc10.png";>
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to