[ 
https://issues.apache.org/jira/browse/SINGA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226179#comment-15226179
 ] 

ASF subversion and git services commented on SINGA-134:
-------------------------------------------------------

Commit a3c82ca913859d690f7bbf7b4706686de0b4d2a8 in incubator-singa's branch 
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=a3c82ca ]

SINGA-134 Extend SINGA to run over a GPU cluster

Minor changes to extend the single node Multi-GPU training to the GPU
cluster scenario.
1. remove gethostip which is not stable (only work for some OS). Now users have 
to config the hostfile with each line specifying the IP (must be IP) of one 
node.
2. register process id no matter the running environment.
3. the singa-run.sh has to `source` the '.profile' ('.bashrc' does not wrok) 
file right before
executing singa, which exports the LD_LIBRARY_PATH for the cudnn and cuda 
library. There is no problem if singa is compiled without cuda.

Checked with cpplint.


> Extend SINGA to run over a GPU cluster
> --------------------------------------
>
>                 Key: SINGA-134
>                 URL: https://issues.apache.org/jira/browse/SINGA-134
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>              Labels: GPU, cluster
>
> Currently SINGA is able to run over a cluster of nodes using CPU and over a 
> single node with multiple GPUs.
> This ticket is going to extend SINGA to run over a GPU cluster.
> The framework is applicable for such training environment. 
> We need to update the code for allocating the GPU workers on different nodes 
> and for messaging passing between GPUs on different nodes (refer to 
> SINGA-133). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to