[
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LI Guobao updated SYSTEMML-2086:
--------------------------------
Description: This part aims to design and implement a local execution
backend for the compiled “paramserv” function. The idea is to spawn a thread in
CP for running the parameter server. And the workers are also launched in
multi-threaded way in CP. (was: This part aims to design and implement a local
execution backend for the compiled “paramserv” function. It consists of the
implementations of partitioning the data for worker threads, launching the
single-node parameter server in CP, shipping and calling the compiled
statistical function and creating different update strategies. We will focus on
implementing BSP execution strategies, i.e., synchronous update strategy
including per epoch and per batch. And other update strategies (e.g.
asynchronous, stale-synchronous) and checkpointing strategies should be
optional and will be added if time permits. The architecture for synchronous
per epoch update strategy is illustrated below.
The idea is to spawn a thread to launch local parameter server which is
responsible for maintaining the parameter hashmap and executing the aggregation
work. And then a number of workers will be forked according to the level of
parallelism. The worker loads data partition, operates the parameter updating
per batch, pushes the gradients and retrieves a new parameter from server. The
server will retrieve the gradients of each worker using the related keys in a
round robin way, aggregate the parameters and push the new global parameter
with the parameter related keys. At last, the paramserv function main thread
should wait for the server aggregator thread joining it and got the last global
parameters as final result. Hence, the pull/push primitive methods can bring
more flexibility and facilitate to implement other update strategies.)
> Initial version of local backend
> --------------------------------
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
> Issue Type: Sub-task
> Reporter: Matthias Boehm
> Assignee: LI Guobao
> Priority: Major
>
> This part aims to design and implement a local execution backend for the
> compiled “paramserv” function. The idea is to spawn a thread in CP for
> running the parameter server. And the workers are also launched in
> multi-threaded way in CP.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)