Thanks a lot!
On Tue, Apr 21, 2015 at 5:51 PM, 陈海波 <[email protected]> wrote: > Hi~, > In our preivious work of deep learning on GPU, we focus on parallel > training of DNN(without convolution layer) for our speech recognition. > It is't easy to adopt model parallelization strategy to speed training.And > with the consideration of transfering big model between node and > node,so we decided to use a single node with multi-GPU for training. And > we use CUDA APIs For transfering messages between GPU and GPU > (both support GPUDirect and without GPUDirect).In our plan,it does not > exits the problem of multi node communication. > Some discussions: > 1) We think cudamat is a good choice for linear algebra computataion. But > we find you use mshadow libraries to develop singa. > As we know,mshadow can provide a GPU Matrix/Tensor Template libary,and > it also support some simple interfaces for Multi-GPU.So we think > we can go on using mshadow for linear algebra computataion on both GPU and > CPU. > Yes. We will continue using Mshadow. > 2) we consult NVIDIA's officials and they give an answer that they are > not sure whether ZeroMQ supports GPUDirect and Infiniband or not,and > they suggest us adopting OpenMPI. > > ZeroMQ should support Infiniband (http://zeromq.org/area:results). But may not support GPUDirect. It seems Caffe ( https://github.com/BVLC/caffe/blob/parallel/src/caffe/parallel.cpp) is implementing distributed training using GPU+Infiniband, but GPUDirect is not used. I will learn more about GPUDirect and discuss with you. Another solution that I am trying to do is to provide a general messaging API (like https://github.com/dmlc/rabit) and provide different implementations (ZeroMQ or MPI). And I think we can discuss more. > > thanks~ > > 在2015-04-21 12:05:19,陈海波<[email protected]>写道: > > As planed in the previous discussion, we are stabilizing the APIs of each > > module. > > One problem I am encountered is about the communication APIs to support > > GPUs. > > > > We can use some libraries like cudamat ( > https://code.google.com/p/cudamat/) > > for linear algebra computation. Hence, the APIs on computation would > almost > > the same as those for CPU. But I have poor knowledge on the communication > > between GPU and CPU, and the communication between GPUs. > > I am asking you for your suggestions. > > > > Wangyuan, Wuwei and Haibo: Since you are working on deep learning using > > GPUs, it would be appreciated if you can give some feedback. > > > > As far as I know that traditionally messages are transferred from GPU > > memory to CPU memory and then transferred through TCP/IP to other nodes > and > > then transferred from CPU memory to GPU memory. We can easily support > such > > communication using the current APIs for CPU. But the transferring > between > > GPU and CPU would bring extra cost. > > NVDIA has provided a technique called GPUDirect, which enables directly > > message passing from GPU memory to network (e.g., infiniband) card. Some > > MPI variants now use this technique. But we have switched from MPI to > > ZeroMQ, we need to make sure that ZeroMQ supports GPUDirect and > > Infiniband. Do you have any investigations on this? Or how do you > > implement the message transferring in your implementation? > > > > Thanks. > > > > regards, > > Wei > >
