anirudh2290 commented on issue #16654: Multithreaded Inference Support
URL: https://github.com/apache/incubator-mxnet/pull/16654#issuecomment-575030986
 
 
   > On the other hand, I am not sure if we can accept this cached op without 
bulking and subgraph support. Subgraph is used for multiple accelerators for 
MXNet inference. Also, bulking is essential to inference speed on GPUs. For 
instance, we saw 20% perf regression if bulking scope is smaller than the 
optimal scope #9055 and it would be even more significant if no bulking is 
enabled at all. I am not sure if this can be shipped with these unknown 
limitations. Thinking about how to better support thread safe bulking and 
subgraph may affect the implementation and design.
   
   1. I have added bulking support. 
   
   2. For subgraphing, If I am correct, MXNet still doesnt support subgraph API 
with gluon. It is still supported only with symbol and MXNET_SUBGRAPH_BACKEND 
env variable works only with graph executor. What I meant to say is that the 
param subgraph in CachedOp is not supported. Now you can use a symbol which has 
already been converted and replaced with subgraph ops with the 
cached_op_threadsafe version. I have added a test to demonstrate that. I think 
this should address your concern with respect to subgraphing.
   
   > Do we have evidence that the newly introduced cached op is performant? I'm 
concerned about the mutex for the whole forward function - that means no thread 
can concurrently push operations to the engine. What I observed is - that could 
take a long time depending on the model used.
   
   Although the mutex in forward is just pushing the ops, which may have delays 
between threads, the execution of these ops will be scheduled in parallel, 
especially when there is no dependency from the ops different threads and 
should improve performance.
   
   https://github.com/awslabs/djl did some benchmarks and were able to see 
improved throughput upto 2x with naive engine. I was also able to see 
performance improvements upto 1.7x when you increase the number of worker 
threads on the threaded engine for cpu. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to