[ 
https://issues.apache.org/jira/browse/SINGA-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangwei resolved SINGA-32.
--------------------------
    Resolution: Fixed

> Implement AllReduce training framework
> --------------------------------------
>
>                 Key: SINGA-32
>                 URL: https://issues.apache.org/jira/browse/SINGA-32
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: wangwei
>            Assignee: wangwei
>
> The AllReduce training framework runs in synchronous mode, where one worker 
> starts the next iteration after all workers have finished the previous 
> iteration. Baidu's deepimage system uses this training framework.
> To implement it in SINGA, we launch one worker group and one server group. 
> The model is partitioned (e.g., on dimension 0) among all workers. Params are 
> sliced and partitioned among all servers. 
> At the beginning, each Param (slice) is put into server shard including 
> number of workers computing gradient for it.
> For each iteration, the local stub aggregates all gradients for the same 
> Param and sends to corresponding server including the number of local workers 
> computing gradient for it. The server will buffer update requests and 
> conducts update for a Param slice until it receives gradients from all 
> workers. It sends back the updated Param (slices) to the corresponding 
> process (stub).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to