Thank @tqchen for sharing the PoC code within such a short timeframe. :) The numbers look promising even with Python native objects deeply copied. Pybind performs deep copy by default unless the receiving object in C++ end is marked as `opaque` so that the Python object passed by reference. That is often used for propagating large object changes from C++ to Python. In our op invocation use cases, there has been no such urgency of introducing this level of complexity so far since the Python objects are small and parameter passing is a one-way trip. The 300ns overhead should give us a good start to squeeze the total overhead into the 2us range. If there is really a need of passing `PyObject`s in the future, we can always add that with a compile flag option. I think it's worth following [this branch]( https://github.com/tqchen/tvm/tree/poc-pyffi) to integrate TVM FFI with MXNet op invocation flow to get more comprehensive benchmark results.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-568567917