tqchen commented on issue #17097: [RFC][mxnet 2.0][item 10.1] MXNet Imperative 
Op Invocation Overhead
URL: 
https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-569139957
 
 
   @larroy indeed every solution has trade-offs, and these tradeoffs are 
discussed in the above posts when we compare solutions, and they are backed by 
benchmarks :) it would be great if you can also suggest potential tradeoffs 
here.
   
   When you expose an API from typed language(c++) to a dynamic 
language(python), you have to type erase it, given that the python functions 
don't have the type, and you have to pass the information along.  
   
   The only difference is where you do the type checking(that the python type 
corresponds to the right c++ type), and translation(translating to the c++ 
type).
   
   For example, in the case of pybind, the erasure is done implicitly when you 
call the python function, then checking and translation happens when you call 
into the c++ function.
   
   In the case of creating a C API for each feature and wrap things in the 
python side, the type checking is done in the python side, and translation as 
well.
   
   In the case of tvm ffi, the type translation is done in the python/cython 
side,  while the type checking is done in the c++. 
   
   To dive deeper into the tradeoffs for PackedFunc calling convention. The 
convention erases the type by having the type code stored into the arguments. 
This brings additional cost of passing arguments into heap, as opposed to 
registers. So they might not be designed for inline functions that needs to 
happen at the order of 1e-9s, however, for API functions that needs to run 
around 1e-7 or even 1e-8 level, this convention is pretty good.
   
   In terms of the calling cost, it really depends on whether the caller and 
callee are strongly typed.
   - If caller is strongly typed, then assigning type code is O(1)
   - If caller is a dynamic type(like python) then we need to have a dispatcher 
to dispatch and select the right type code
   - If callee is strongly typed, then the cost of checking is O(1) by just 
check the code to be the correct one 
   - If the callee is dynamic type, then a dispatching need to happen, which 
have another level of hashtable lookup O(1)
   
   As we can see, the only place where dispatching is necessary is the dynamic 
type handling case. Even in these cases, if there is a strong need of 
specialization, we can directly force the type by running checking on the 
caller, and pass in the right type code (the engineering burden is the same as 
wrapping the C API). However, the benchmark suggests that the dynamic 
dispatching cost is reasonable, and satisfies the API speed.
   
   Coming back to the tradeoff, the main tradeoff here is the engineering 
burden to keep an hourglass design(with fixed set of API) vs efficiency. While 
my post did not suggest that TVM's ffi is a silver bullet, it does works pretty 
well for our use cases. hope it helps
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to