I added a few more figures showing how I got the MXNET_KVSTORE_GPUARRAY_BOUND value [Figures 7(b) and 7(c)]. I performed a microbenchmark measuring runtime in seconds vs. message size sent using MXNet's KVStore. Figure 7(b) shows the results of a crossover point around 1M. Beyond this point, multi-tree seems to show higher bandwidth, and before this point, single tree higher bandwidth.
However, the 150 push-pulls before waiting microbenchmark [Figure 7(c)] shows the crossover point around 10M if we extrapolate its behaviour to the right. These could not be plotted due to the memory consumption being too high since I am using 150 push-pulls of fairly large size as a proxy for neural network parameters. This combined with doing a parameter sweep over MXNET_KVSTORE_GPUARRAY_BOUND shown in Figure 7(a) on VGG suggests that 10M is preferable to 1M. I currently generate 8 trees whose roots are located at each GPU for the multiple root case. I use only the first tree when doing the single tree Reduce and Broadcast. This showed better performance compared to using different roots in single tree case. Regards, Carl On 6/25/18, Pedro Larroy <[email protected]> wrote: > Nice design document. From where does it come the default value > of MXNET_KVSTORE_GPUARRAY_BOUND of 10M? > Do you generate a tree for each GPU? > > Pedro. > > > On Mon, Jun 18, 2018 at 2:30 PM Carl Yang <[email protected]> wrote: > >> Hi, >> >> Currently, we have two methods for single-machine communication: >> parameter server and NCCL ring reduction. Both of these methods have >> some downsides. Parameter server does not differentiate between NVLink >> connections and PCI-E, so it ends up using the higher latency and >> slower PCI-E connections as frequently as it does NVLink. NCCL uses >> the ring reduce algorithm, which has higher theoretical latency than >> other algorithms. I am working on a topology-aware approach that can >> address these limitations. Design proposal is on cwiki: >> >> https://cwiki.apache.org/confluence/display/MXNET/Single+machine+All+Reduce+Topology-aware+Communication >> >> Please feel free to let me know if you have any suggestions. >> >> Regards, >> Carl >> >
