hkvision opened a new issue #17822: [Question] Distributed training performance for one worker and one server on the same node URL: https://github.com/apache/incubator-mxnet/issues/17822 Hi, I’m referring to this page https://mxnet.apache.org/api/faq/distributed_training.html for distributed training (using mxnet-mkl and running on multiple cpu nodes). Want to ask is there any performance benchmarks to run distributed training with one worker and one server on a node? Or is there any best practices to run distributed training to well utilize the resources while get good performance? As I run, the performance is affected if I put one worker and one server on a single node, each occupying half of the cores. Thanks a lot in advance!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
