arcadiaphy edited a comment on issue #16431: [RFC] MXNet Multithreaded 
Inference Interface
URL: 
https://github.com/apache/incubator-mxnet/issues/16431#issuecomment-562052116
 
 
   @anirudh2290 Just see this RFC. Let me share what I've done in multithreaded 
infererce, I think it's the only viable way now in mxnet.
   
   I've deployed many models with scala API, and run them in multiple threads. 
The whole system has run smoothly in production environment for more than 2 
months.
   
   The backend of inference is graph executor, which is created for each thread 
with shared model parameters. The executors can be dynamically reshaped in each 
thread independently according to the shape of the data input.
   
   Like what's mentioned above, the dependency engine is not thread safe, so if 
you run it in threaded engine, dead lock and core dump will happen. Therefore, 
naive engine is the only option left. Without the dependency scheduling, any 
write dependency on model parameters is likely to be executed simultaneously 
and mess the internal data. If mkldnn is used to accelerate inference, you will 
get non-deterministic results per inference because mxnet stealthily reorder 
the data in ndarray (write dependency involved) for mkldnn operators. I've used 
a temporary method to address this issue which is not suitable for an official 
PR.
   
   Multithreaded inference should be used with caution. Sharing model 
parameters can reduce the memory footprint in your program, but a lot of memory 
usage is consumed by global resources (temporary workspace, random number 
generator, ...) or op cache for mkldnn which are stored in static thread_local 
variables. So **thread number** is the most important factor for memory 
footprint, any thread involving mxnet operation, be it any trivial imperative 
invoking of operators, will incur memory overhead by creating its own set of 
thread_local variables. I've spent so much time tracking down memory leak and 
the best solution is to limit thread number.
   
   A new method to do multithreaded inference by threaded engine is much 
welcomed here. It will solve the above issues automatically.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to