[
https://issues.apache.org/jira/browse/SINGA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226179#comment-15226179
]
ASF subversion and git services commented on SINGA-134:
-------------------------------------------------------
Commit a3c82ca913859d690f7bbf7b4706686de0b4d2a8 in incubator-singa's branch
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=a3c82ca ]
SINGA-134 Extend SINGA to run over a GPU cluster
Minor changes to extend the single node Multi-GPU training to the GPU
cluster scenario.
1. remove gethostip which is not stable (only work for some OS). Now users have
to config the hostfile with each line specifying the IP (must be IP) of one
node.
2. register process id no matter the running environment.
3. the singa-run.sh has to `source` the '.profile' ('.bashrc' does not wrok)
file right before
executing singa, which exports the LD_LIBRARY_PATH for the cudnn and cuda
library. There is no problem if singa is compiled without cuda.
Checked with cpplint.
> Extend SINGA to run over a GPU cluster
> --------------------------------------
>
> Key: SINGA-134
> URL: https://issues.apache.org/jira/browse/SINGA-134
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Labels: GPU, cluster
>
> Currently SINGA is able to run over a cluster of nodes using CPU and over a
> single node with multiple GPUs.
> This ticket is going to extend SINGA to run over a GPU cluster.
> The framework is applicable for such training environment.
> We need to update the code for allocating the GPU workers on different nodes
> and for messaging passing between GPUs on different nodes (refer to
> SINGA-133).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)