@larroy indeed every solution has trade-offs, and these tradeoffs are discussed in the above posts when we compare solutions, and they are backed by benchmarks :) it would be great if you can also suggest potential tradeoffs here.
When you expose an API from typed language(c++) to a dynamic language(python), you have to type erase it, given that the python functions don't have the type, and you have to pass the information along. The only difference is where you do the type checking(that the python type corresponds to the right c++ type), and translation(translating to the c++ type). For example, in the case of pybind, the erasure is done implicitly when you call the python function, then checking and translation happens when you call into the c++ function. In the case of creating a C API for each feature and wrap things in the python side, the type checking is done in the python side, and translation as well. In the case of tvm ffi, the type translation is done in the python/cython side, while the type checking is done in the c++. To dive deeper into the tradeoffs for PackedFunc calling convention. The convention erases the type by having the type code stored into the arguments. This brings additional cost of passing arguments into heap, as opposed to registers. So they might not be designed for inline functions that needs to happen at the order of 1e-9s, however, for API functions that needs to run around 1e-7 or even 1e-8 level, this convention is pretty good. In terms of the calling cost, it really depends on whether the caller and callee are strongly typed. - If caller is strongly typed, then assigning type code is O(1) - If caller is a dynamic type(like python) then we need to have a dispatcher to dispatch and select the right type code - If callee is strongly typed, then the cost of checking is O(1) by just check the code to be the correct one - If the callee is dynamic type, then a dispatching need to happen, which have another level of hashtable lookup O(1) As we can see, the only place where dispatching is necessary is the dynamic type handling case. Even in these cases, if there is a strong need of specialization, we can directly force the type by running checking on the caller, and pass in the right type code (the engineering burden is the same as wrapping the C API). However, the benchmark suggests that the dynamic dispatching cost is reasonable, and satisfies the API speed. Coming back to the tradeoff, the main tradeoff here is the engineering burden to keep an hourglass design(with fixed set of API) vs efficiency. While my post did not suggest that TVM's ffi is a silver bullet, it does works pretty well for our use cases. hope it helps -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-569139957