[ 
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

LI Guobao updated SYSTEMML-2086:
--------------------------------
    Description: This part aims to design and implement a local execution 
backend for the compiled “paramserv” function. The idea is to spawn a thread in 
CP for running the parameter server. And the workers are also launched in 
multi-threaded way in CP.  (was: This part aims to design and implement a local 
execution backend for the compiled “paramserv” function. It consists of the 
implementations of partitioning the data for worker threads, launching the 
single-node parameter server in CP, shipping and calling the compiled 
statistical function and creating different update strategies. We will focus on
 implementing BSP execution strategies, i.e., synchronous update strategy 
including per epoch and per batch. And other update strategies (e.g. 
asynchronous, stale-synchronous) and checkpointing strategies should be 
optional and will be added if time permits. The architecture for synchronous 
per epoch update strategy is illustrated below.

The idea is to spawn a thread to launch local parameter server which is 
responsible for maintaining the parameter hashmap and executing the aggregation 
work. And then a number of workers will be forked according to the level of 
parallelism. The worker loads data partition, operates the parameter updating 
per batch, pushes the gradients and retrieves a new parameter from server. The 
server will retrieve the gradients of each worker using the related keys in a 
round robin way, aggregate the parameters and push the new global parameter 
with the parameter related keys. At last, the paramserv function main thread 
should wait for the server aggregator thread joining it and got the last global 
parameters as final result. Hence, the pull/push primitive methods can bring 
more flexibility and facilitate to implement other update strategies.)

> Initial version of local backend
> --------------------------------
>
>                 Key: SYSTEMML-2086
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: Matthias Boehm
>            Assignee: LI Guobao
>            Priority: Major
>
> This part aims to design and implement a local execution backend for the 
> compiled “paramserv” function. The idea is to spawn a thread in CP for 
> running the parameter server. And the workers are also launched in 
> multi-threaded way in CP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to