[
https://issues.apache.org/jira/browse/SYSTEMML-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LI Guobao updated SYSTEMML-2085:
--------------------------------
Attachment: ps.png
> Single-node parameter server primitives
> ---------------------------------------
>
> Key: SYSTEMML-2085
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2085
> Project: SystemML
> Issue Type: Sub-task
> Reporter: Matthias Boehm
> Assignee: LI Guobao
> Priority: Major
> Attachments: ps.png
>
>
> A single node parameter server acts as a data-parallel parameter server. And
> a multi-node model parallel parameter server will be discussed if time
> permits. The idea is to run a single-node parameter server by maintaining a
> hashmap inside the CP (Control Program) where the parameter as value
> accompanied with a defined key. For example, inserting the global parameter
> with a key named “worker-param-replica” allows the workers to retrieve the
> parameter replica. Hence, in the context of local multi-threaded backend,
> workers can communicate directly with this hashmap in the same process. And
> in the context of Spark distributed backend, the CP firstly needs to fork a
> thread to start a parameter server which maintains a hashmap. And secondly
> the workers can send intermediates and retrieve parameters by connecting to
> parameter server via TCP socket. Since SystemML has good cache management, we
> only need to maintain the matrix reference pointing to a file location
> instead of real data instance in the hashmap. If time permits, to be able to
> introduce the async and staleness update strategies, we would need to
> implement the synchronization by leveraging vector clock.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)