[
https://issues.apache.org/jira/browse/SINGA-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangwei resolved SINGA-32.
--------------------------
Resolution: Fixed
> Implement AllReduce training framework
> --------------------------------------
>
> Key: SINGA-32
> URL: https://issues.apache.org/jira/browse/SINGA-32
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Assignee: wangwei
>
> The AllReduce training framework runs in synchronous mode, where one worker
> starts the next iteration after all workers have finished the previous
> iteration. Baidu's deepimage system uses this training framework.
> To implement it in SINGA, we launch one worker group and one server group.
> The model is partitioned (e.g., on dimension 0) among all workers. Params are
> sliced and partitioned among all servers.
> At the beginning, each Param (slice) is put into server shard including
> number of workers computing gradient for it.
> For each iteration, the local stub aggregates all gradients for the same
> Param and sends to corresponding server including the number of local workers
> computing gradient for it. The server will buffer update requests and
> conducts update for a Param slice until it receives gradients from all
> workers. It sends back the updated Param (slices) to the corresponding
> process (stub).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)