Thank @tqchen for sharing the PoC code within such a short timeframe. :) The 
numbers look promising even with Python native objects deeply copied. Pybind 
performs deep copy by default unless the receiving object in C++ end is marked 
as `opaque` so that the Python object passed by reference. That is often used 
for propagating large object changes from C++ to Python. In our op invocation 
use cases, there has been no such urgency of introducing this level of 
complexity so far since the Python objects are small and parameter passing is a 
one-way trip. The 300ns overhead should give us a good start to squeeze the 
total overhead into the 2us range. If there is really a need of passing 
`PyObject`s in the future, we can always add that with a compile flag option. I 
think it's worth following [this branch]( 
https://github.com/tqchen/tvm/tree/poc-pyffi) to integrate TVM FFI with MXNet 
op invocation flow to get more comprehensive benchmark results.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17097#issuecomment-568567917

Reply via email to