[
https://issues.apache.org/jira/browse/SYSTEMML-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
LI Guobao updated SYSTEMML-2086:
--------------------------------
Description:
This part aims to design and implement a local execution backend for the
compiled “paramserv” function. It consists of the implementations of
partitioning the data for worker threads, launching the single-node parameter
server in CP, shipping and calling the compiled statistical function and
creating different update strategies. We will focus on
implementing BSP execution strategies, i.e., synchronous update strategy
including per epoch and per batch. And other update strategies (e.g.
asynchronous, stale-synchronous) and checkpointing strategies should be
optional and will be added if time permits. The architecture for synchronous
per epoch update strategy is illustrated below.
The idea is to spawn a thread to launch local parameter server which is
responsible for maintaining the parameter hashmap and executing the aggregation
work. And then a number of workers will be forked according to the level of
parallelism. The worker loads data partition, operates the parameter updating
per batch, pushes the gradients and retrieves a new parameter from server. The
server will retrieve the gradients of each worker using the related keys in a
round robin way, aggregate the parameters and push the new global parameter
with the parameter related keys. At last, the paramserv function main thread
should wait for the server aggregator thread joining it and got the last global
parameters as final result. Hence, the pull/push primitive methods can bring
more flexibility and facilitate to implement other update strategies.
was:
This part aims to design and implement a local execution backend for the
compiled “paramserv” function. It consists of the implementations of
partitioning the data for worker threads, launching the single-node parameter
server in CP, shipping and calling the compiled statistical function and
creating different update strategies. We will focus on
implementing BSP execution strategies, i.e., synchronous update strategy
including per epoch and per batch. And other update strategies (e.g.
asynchronous, stale-synchronous) and checkpointing strategies should be
optional and will be added if time permits. The architecture for synchronous
per epoch update strategy is illustrated below.
> Initial version of local backend
> --------------------------------
>
> Key: SYSTEMML-2086
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2086
> Project: SystemML
> Issue Type: Sub-task
> Reporter: Matthias Boehm
> Assignee: LI Guobao
> Priority: Major
>
> This part aims to design and implement a local execution backend for the
> compiled “paramserv” function. It consists of the implementations of
> partitioning the data for worker threads, launching the single-node parameter
> server in CP, shipping and calling the compiled statistical function and
> creating different update strategies. We will focus on
> implementing BSP execution strategies, i.e., synchronous update strategy
> including per epoch and per batch. And other update strategies (e.g.
> asynchronous, stale-synchronous) and checkpointing strategies should be
> optional and will be added if time permits. The architecture for synchronous
> per epoch update strategy is illustrated below.
> The idea is to spawn a thread to launch local parameter server which is
> responsible for maintaining the parameter hashmap and executing the
> aggregation work. And then a number of workers will be forked according to
> the level of parallelism. The worker loads data partition, operates the
> parameter updating per batch, pushes the gradients and retrieves a new
> parameter from server. The server will retrieve the gradients of each worker
> using the related keys in a round robin way, aggregate the parameters and
> push the new global parameter with the parameter related keys. At last, the
> paramserv function main thread should wait for the server aggregator thread
> joining it and got the last global parameters as final result. Hence, the
> pull/push primitive methods can bring more flexibility and facilitate to
> implement other update strategies.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)