[
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363567#comment-16363567
]
Janardhan commented on SYSTEMML-2083:
-------------------------------------
The light weight parameter server interface is
[ps-lite|[https://github.com/dmlc/ps-lite|https://github.com/dmlc/ps-lite].] ]
as a simple example.
In simple terms, let's say we have (7 min read)
{code:java}
to caculate weights, with help of gradients.{code}
1. How parameter server looks? contains workers, server and data.
!image-2018-02-14-12-18-48-932.png!
2. What worker do? takes a little data & *calculates gradients* from it & sends
them to server.
!image-2018-02-14-12-21-00-932.png!
3. What server do? get the gradients from workers and *calculates weights*.
!image-2018-02-14-12-22-39-736.png!
> Language and runtime for parameter servers
> ------------------------------------------
>
> Key: SYSTEMML-2083
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
> Project: SystemML
> Issue Type: Epic
> Reporter: Matthias Boehm
> Priority: Major
> Labels: gsoc2018
> Attachments: image-2018-02-14-12-18-48-932.png,
> image-2018-02-14-12-21-00-932.png
>
>
> SystemML already provides a rich set of execution strategies ranging from
> local operations to large-scale computation on MapReduce or Spark. In this
> context, we support both data-parallel (multi-threaded or distributed
> operations) as well as task-parallel computation (multi-threaded or
> distributed parfor loops). This epic aims to complement the existing
> execution strategies by language and runtime primitives for parameter
> servers, i.e., model-parallel execution. We use the terminology of
> model-parallel execution with distributed data and distributed model to
> differentiate them from the existing data-parallel operations. Target
> applications are distributed deep learning and mini-batch algorithms in
> general. These new abstractions will help making SystemML a unified framework
> for small- and large-scale machine learning that supports all three major
> execution strategies in a single framework.
>
> A major challenge is the integration of stateful parameter servers and their
> common push/pull primitives into an otherwise functional (and thus,
> stateless) language. We will approach this challenge via a new builtin
> function \{{paramserv}} which internally maintains state but at the same time
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends
> (local and distributed), (2) different parameter server modes (synchronous,
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies
> (batch, multi-batch, epoch), as well as (4) different architectures for
> distributed data (1 parameter server, k workers) and distributed model (k1
> parameter servers, k2 workers).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)