[
https://issues.apache.org/jira/browse/SINGA-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601209#comment-14601209
]
ASF subversion and git services commented on SINGA-8:
-----------------------------------------------------
Commit 884b9d70a631bee4961fb3907e47a747c5dd2b89 in incubator-singa's branch
refs/heads/master from wang wei
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=884b9d7 ]
SINGA-8 Implement distributed Hogwild
The original Param objects are sliced to make the size of parameters mastered
by server groups (roughly) equal.
Following Caffe's implementation, we let each server group master a subset of
param slices.
Each server group updates all model parameters for the corresponding worker
groups and synchronize with other server groups on their mastered slices.
Tested on single node with multiple processes, each of which has one server
group with one server and one worker group with one worker.
The training loss decreases not as fast as shared-memory hogwild. TODO optimize
and test on multiple nodes.
> Implement distributed Hogwild
> -----------------------------
>
> Key: SINGA-8
> URL: https://issues.apache.org/jira/browse/SINGA-8
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Assignee: wangwei
> Labels: distributed, features, hogwild
>
> Generally, both the Downpour framework of Google Brain [1] and the Caffe's
> distributed Hogwild implementation are extensions of the shared memory
> Hogwild training. In this ticket, we refer to the second one.
> In specific, each server group masters a subset of parameters (i.e., Param
> objects) when synchronizing with other server groups. It aggregates all
> updates for its subset and sends back (e.g., broadcast) the updated
> parameters back to all other server groups. The synchronization is conducted
> asynchronously. The frequency can be fixed in the first implementations.
> Finally, it should be tuned automatically to fully utilize the network
> bandwidth.
> [1]J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M.
> Ranzato, A. W. Senior, P. A. Tucker, K. Yang, and A. Y. Ng. Large scale
> distributed deep networks. In NIPS, pages 1232{1240, 2012.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)