Hi,

Currently, MXNet distributed can only be done using parameter server.
Horovod is an open-source distributed training framework that has
shown 2x speedup compared to TensorFlow using Parameter Server. We
propose to add Horovod support to MXNet. This will help our users
achieve goal of linear scalability to 256 GPUs and beyond. Design
proposal on cwiki:

https://cwiki.apache.org/confluence/display/MXNET/Horovod-MXNet+Integration

Please feel free to let me know if you have any suggestions or feedback.

Regards,
Carl

Reply via email to