barry-jin commented on pull request #19685:
URL: https://github.com/apache/incubator-mxnet/pull/19685#issuecomment-780982802
I have replaced the backend APIs(MXInvokeCachedOp,
MXNET_REGISTER_GLOBAL("cached_op.invoke")) with simple or dummy implementation
so that we can fully expose the overhead of the API call with/without this PR
by removing the computational costs. The results is shown as follows:
For CachedOp invocation call without this PR, it takes around 7.22 us and
the most of the overhead is in making cython/python args; For CachedOp
invocation call with this PR, it takes around 4.041 us and the most of the
overhead is in type translation/checking in packedfunc system.
CachedOp Invocation in cython code:
<img width="855" alt="Screen Shot 2021-02-17 at 5 42 22 PM"
src="https://user-images.githubusercontent.com/69359374/108292283-a5e89800-7148-11eb-8088-55cfd0f7a6b0.png">
CachedOp Invocation with new FFI implementation(accelerated by cython):
<img width="992" alt="Screen Shot 2021-02-17 at 5 42 30 PM"
src="https://user-images.githubusercontent.com/69359374/108292290-aa14b580-7148-11eb-9a95-4feec7fedc10.png">
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]