[jira] [Commented] (SYSTEMML-2083) Language and runtime for parameter servers

Matthias Boehm (JIRA) Tue, 13 Feb 2018 21:48:25 -0800

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363526#comment-16363526
 ]


Matthias Boehm commented on SYSTEMML-2083:
------------------------------------------

awesome [~chamath] - that sounds very good. As the next steps for both of you 
[~chamath] and [~mpgovinda], I would recommend the following:

1) SytemML: Familiarize yourself with SystemML (e.g., read the paper referenced 
above), the documentation, its algorithms (including the nn library), and maybe 
run a simple linear regression algorithm over dense and sparse matrices with 
the following data generators and algorithm scripts:
https://github.com/apache/systemml/blob/master/scripts/datagen/genRandData4LinearRegression.dml
https://github.com/apache/systemml/blob/master/scripts/algorithms/LinearRegCG.dml

2) Understanding the Problem: Unless you're already familiar with typical 
parameter server architectures, I would recommend to start from a recent paper 
and its references (e.g., 
https://ds3lab.org/wp-content/uploads/2017/07/sigmod2017_jiang.pdf, which does 
a good job in summarizing existing systems). Ultimately, we want to build 
compiler and runtime support for multiple different update strategies. So ask 
yourself if you would be interested in contributing to the internals of 
SystemML.

3) Project Discussion: Subsequently, we would discuss the actual project in 
more detail. This epic is large enough for allowing multiple interesting sub 
projects on which individual students can work. Based on your ideas, 
collaboration preferences, and technical interests, we can cut these projects 
accordingly. The goal is to work toward a high-quality project proposal in an 
interactive manner. 

4) GSoC Application: According to the GSoc timeline, you would then submit by 
March 27 your proposal to the ASF as the mentoring organization. For more 
details, please see http://community.apache.org/gsoc.html. 

> Language and runtime for parameter servers
> ------------------------------------------
>
>                 Key: SYSTEMML-2083
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
>             Project: SystemML
>          Issue Type: Epic
>            Reporter: Matthias Boehm
>            Priority: Major
>              Labels: gsoc2018
>
> SystemML already provides a rich set of execution strategies ranging from 
> local operations to large-scale computation on MapReduce or Spark. In this 
> context, we support both data-parallel (multi-threaded or distributed 
> operations) as well as task-parallel computation (multi-threaded or 
> distributed parfor loops). This epic aims to complement the existing 
> execution strategies by language and runtime primitives for parameter 
> servers, i.e., model-parallel execution. We use the terminology of 
> model-parallel execution with distributed data and distributed model to 
> differentiate them from the existing data-parallel operations. Target 
> applications are distributed deep learning and mini-batch algorithms in 
> general. These new abstractions will help making SystemML a unified framework 
> for small- and large-scale machine learning that supports all three major 
> execution strategies in a single framework.
>  
> A major challenge is the integration of stateful parameter servers and their 
> common push/pull primitives into an otherwise functional (and thus, 
> stateless) language. We will approach this challenge via a new builtin 
> function \{{paramserv}} which internally maintains state but at the same time 
> fits into the runtime framework of stateless operations.
> Furthermore, we are interested in providing (1) different runtime backends 
> (local and distributed), (2) different parameter server modes (synchronous, 
> asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
> (batch, multi-batch, epoch), as well as (4) different architectures for 
> distributed data (1 parameter server, k workers) and distributed model (k1 
> parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (SYSTEMML-2083) Language and runtime for parameter servers

Reply via email to