reminisce commented on issue #17097: [RFC][mxnet 2.0][item 10.1] MXNet 
Imperative Op Invocation Overhead
URL: 
https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-568233782
 
 
   @ptrendx Yes, there is an effort of profiling engine code flow using VTune. 
We hope the exercise can pinpoint the hotspots that contribute to the most part 
of latency. Further time split for pure C++ part between setup code (shape/type 
inference, memory allocation, dependency setup) and op scheduling is also 
around 50% vs. 50%.
   
   For the "fast path" data structures, I'm summarizing the items as follows 
(including the ones suggested by @sxjscience):
   
   - `tuple` and `list` since they can be interchangeable in NumPy semantics to 
represent shapes and axes.
   - `str` because einsum has this parameter and the op can be intensively used 
in transformer models.
   - `py_slice`, `Ellipsis`, `None` for basic indexing. We can do one step 
further by moving the whole indexing dispatch logic to backend.
   - np scalars.
   - `mx.context.Context`. One call of `mx.cpu()` can be as large as 600ns 
using ctypes. One thought is do it in the pybind way by creating a Python 
binding for the backend `Context` class.
   - `np.dtype`. Similar to `Context`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to