Matthias Boehm created SYSTEMML-2083:
----------------------------------------

             Summary: Language and runtime for parameter servers
                 Key: SYSTEMML-2083
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2083
             Project: SystemML
          Issue Type: Epic
            Reporter: Matthias Boehm


SystemML already provides a rich set of execution strategies ranging from local 
operations to large-scale computation on MapReduce or Spark. In this context, 
we support both data-parallel (multi-threaded or distributed operations) as 
well as task-parallel computation (multi-threaded or distributed parfor loops). 
This epic aims to complement the existing execution strategies by language and 
runtime primitives for parameter servers, i.e., model-parallel execution. We 
use the terminology of model-parallel execution with distributed data and 
distributed model to differentiate them from the existing data-parallel 
operations. Target applications are distributed deep learning and mini-batch 
algorithms in general. These new abstractions will help making SystemML a 
unified framework for small- and large-scale machine learning that supports all 
three major execution strategies in a single framework.

 

A major challenge is the integration of stateful parameter servers and their 
common push/pull primitives into an otherwise functional (and thus, stateless) 
language. We will approach this challenge via a new builtin function 
\{{paramserv}} which internally maintains state but at the same time fits into 
the runtime framework of stateless operations.

Furthermore, we are interested in providing (1) different runtime backends 
(local and distributed), (2) different parameter server modes (synchronous, 
asynchronous, hogwild!, stale-synchronous), (3) different update frequencies 
(batch, multi-batch, epoch), as well as (4) different architectures for 
distributed data (1 parameter server, k workers) and distributed model (k1 
parameter servers, k2 workers). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to