anirudh2290 commented on issue #16654: Multithreaded Inference Support URL: https://github.com/apache/incubator-mxnet/pull/16654#issuecomment-575030986 > On the other hand, I am not sure if we can accept this cached op without bulking and subgraph support. Subgraph is used for multiple accelerators for MXNet inference. Also, bulking is essential to inference speed on GPUs. For instance, we saw 20% perf regression if bulking scope is smaller than the optimal scope #9055 and it would be even more significant if no bulking is enabled at all. I am not sure if this can be shipped with these unknown limitations. Thinking about how to better support thread safe bulking and subgraph may affect the implementation and design. 1. I have added bulking support. 2. For subgraphing, If I am correct, MXNet still doesnt support subgraph API with gluon. It is still supported only with symbol and MXNET_SUBGRAPH_BACKEND env variable works only with graph executor. What I meant to say is that the param subgraph in CachedOp is not supported. Now you can use a symbol which has already been converted and replaced with subgraph ops with the cached_op_threadsafe version. I have added a test to demonstrate that. I think this should address your concern with respect to subgraphing. > Do we have evidence that the newly introduced cached op is performant? I'm concerned about the mutex for the whole forward function - that means no thread can concurrently push operations to the engine. What I observed is - that could take a long time depending on the model used. Although the mutex in forward is just pushing the ops, which may have delays between threads, the execution of these ops will be scheduled in parallel, especially when there is no dependency from the ops different threads and should improve performance. https://github.com/awslabs/djl did some benchmarks and were able to see improved throughput upto 2x with naive engine. I was also able to see performance improvements upto 1.7x when you increase the number of worker threads on the threaded engine for cpu.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
