@larroy indeed every solution has trade-offs, and these tradeoffs are discussed 
in the above posts when we compare solutions, and they are backed by benchmarks 
:) it would be great if you can also suggest potential tradeoffs here.

When you expose an API from typed language(c++) to a dynamic language(python), 
you have to type erase it, given that the python functions don't have the type, 
and you have to pass the information along.  

The only difference is where you do the type checking(that the python type 
corresponds to the right c++ type), and translation(translating to the c++ 
type).

For example, in the case of pybind, the erasure is done implicitly when you 
call the python function, then checking and translation happens when you call 
into the c++ function.

In the case of creating a C API for each feature and wrap things in the python 
side, the type checking is done in the python side, and translation as well.

In the case of tvm ffi, the type translation is done in the python/cython side, 
 while the type checking is done in the c++. 

To dive deeper into the tradeoffs for PackedFunc calling convention. The 
convention erases the type by having the type code stored into the arguments. 
This brings additional cost of passing arguments into heap, as opposed to 
registers. So they might not be designed for inline functions that needs to 
happen at the order of 1e-9s, however, for API functions that needs to run 
around 1e-7 or even 1e-8 level, this convention is pretty good.

In terms of the calling cost, it really depends on whether the caller and 
callee are strongly typed.
- If caller is strongly typed, then assigning type code is O(1)
- If caller is a dynamic type(like python) then we need to have a dispatcher to 
dispatch and select the right type code
- If callee is strongly typed, then the cost of checking is O(1) by just check 
the code to be the correct one 
- If the callee is dynamic type, then a dispatching need to happen, which have 
another level of hashtable lookup O(1)

As we can see, the only place where dispatching is necessary is the dynamic 
type handling case. Even in these cases, if there is a strong need of 
specialization, we can directly force the type by running checking on the 
caller, and pass in the right type code (the engineering burden is the same as 
wrapping the C API). However, the benchmark suggests that the dynamic 
dispatching cost is reasonable, and satisfies the API speed.

Coming back to the tradeoff, the main tradeoff here is the engineering burden 
to keep an hourglass design(with fixed set of API) vs efficiency. While my post 
did not suggest that TVM's ffi is a silver bullet, it does works pretty well 
for our use cases. hope it helps


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-569139957

Reply via email to