[ 
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2085:
--------------------------------
    Attachment: ps.png

> Single-node parameter server primitives
> ---------------------------------------
>
>                 Key: SYSTEMML-2085
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>         Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And 
> a multi-node model parallel parameter server will be discussed if time 
> permits. The idea is to run a single-node parameter server by maintaining a 
> hashmap inside the CP (Control Program) where the parameter as value 
> accompanied with a defined key. For example, inserting the global parameter 
> with a key named “worker-param-replica” allows the workers to retrieve the 
> parameter replica. Hence, in the context of local multi-threaded backend, 
> workers can communicate directly with this hashmap in the same process. And 
> in the context of Spark distributed backend, the CP firstly needs to fork a 
> thread to start a parameter server which maintains a hashmap. And secondly 
> the workers can send intermediates and retrieve parameters by connecting to 
> parameter server via TCP socket. Since SystemML has good cache management, we 
> only need to maintain the matrix reference pointing to a file location 
> instead of real data instance in the hashmap. If time permits, to be able to 
> introduce the async and staleness update strategies, we would need to 
> implement the synchronization by leveraging vector clock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to