Hi,

Currently, we have two methods for single-machine communication:
parameter server and NCCL ring reduction. Both of these methods have
some downsides. Parameter server does not differentiate between NVLink
connections and PCI-E, so it ends up using the higher latency and
slower PCI-E connections as frequently as it does NVLink. NCCL uses
the ring reduce algorithm, which has higher theoretical latency than
other algorithms. I am working on a topology-aware approach that can
address these limitations. Design proposal is on cwiki:
https://cwiki.apache.org/confluence/display/MXNET/Single+machine+All+Reduce+Topology-aware+Communication

Please feel free to let me know if you have any suggestions.

Regards,
Carl

Reply via email to