[ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:
--------------------------------
    Description: 
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

A diagram of the parameter server architecture is shown below.

  was:
A single node parameter server acts as a data-parallel parameter server. And a 
multi-node model parallel parameter server will be discussed if time permits. 

Synchronization:

We also need to implement the synchronization between workers and parameter 
server to be able to bring more parameter update strategies, e.g., the 
stale-synchronous strategy needs a hyperparameter "staleness" to define the 
waiting interval. The idea is to maintain a vector clock recording all workers' 
clock in the server. Each time when an iteration in side of worker finishes, it 
waits server to give a signal, i.e., to send a request for calculating the 
staleness according to the vector clock. And when the server receives the 
gradients from certain worker, it will increment the vector clock for this 
worker. So we could define BSP as "staleness==0", ASP as "staleness==-1" and 
SSP as "staleness==N".

A diagram of the parameter server architecture is shown below.


> Single-node parameter server primitives
> ---------------------------------------
>
>                 Key: SYSTEMML-2085
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
>             Project: SystemML
>          Issue Type: Technical task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>         Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. 
> A diagram of the parameter server architecture is shown below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to