RE: Proposal for an Apache Hama sub-project

2017-02-27 Thread Sachin Ghai
Thank you Edward for further feedback. Congratulations on the new job.

REST based prediction serving would be a key part of this distributed system 
design and services would be independently deployable. (for instance, a K-means 
service can be independent of the regression service.) I believe it would be 
good to rely on some of the best patterns from micro-services architecture and 
make design decisions in accordance as we proceed ahead.

Requesting feedback from other members of Hama community as well in this 
discussion and look forward to co-creating Scalar.

Thanks,
Sachin Ghai


-Original Message-
From: Edward J. Yoon [mailto:edward.y...@samsung.com]
Sent: 28 February 2017 05:43 AM
To: dev@hama.apache.org
Cc: gene...@incubator.apache.org
Subject: RE: Proposal for an Apache Hama sub-project

Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async 
communication/computation networks with various topologies and resource 
allocation. However, I'm not sure whether this approach is also fit for modern 
microservice architecture? In my opinion, this can be discussed and cooked in 
Hama community as a sub-project until it's mature enough (CC'ing general@i.a.o. 
I'll be happy to read more feedbacks from ASF incubator community).

P.S., It seems you referred to incubation proposal template. There's no need to 
add me as initial committer (I don't have much time to actively contribute to 
your project). And, I recently quit Samsung Electronics and joined to $200 
billion sized O2O e-commerce company as a CTO.

-Original Message-
From: Sachin Ghai [mailto:sachin.g...@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate discussion 
around the proposal. The proposed sub-project named 'Scalar' is a scalable 
orchestration, training and serving system for machine learning and deep 
learning. Scalar would leverage Apache Hama to automate the distributed 
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project proposal 
template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data 
analytics and deep learning modelling, deployment, serving with high 
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user to 
easily scale the functions of training a model, deploying a model and serving 
the prediction from underlying machine learning or deep learning framework. It 
is also the characteristic of its execution framework to orchestrate 
heterogeneous workload graphs utilizing Apache Hama, Apache Hadoop, Apache 
Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta 
tested for one of the largest insurance organizations in a client specific PoC. 
The motivation behind this work is to build a framework that provides 
abstraction on heterogeneous data science frameworks and helps users leverage 
them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in 
the industry. As an application developer, it becomes a hard choice to switch 
from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction 
results for each model in different frameworks. We aim to provide an 
abstraction framework which can be used to seamlessly train and deploy the 
model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction in 
the most performant, scalable and efficient way for a multi-tenant deployment. 
The key performance metrics will be reduction in training time, lower error 
rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described in 
terms of state, sequences and algorithms. The engine invokes execution context 
of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning and 
scheduling computations on a distributed cluster. A data object layer provides 
access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client 
applications. A caching layer in the middle acts as a latency improver for 
various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the

RE: Proposal for an Apache Hama sub-project

2017-02-27 Thread Edward J. Yoon
Thanks for your proposal.

I of course think Apache Hama can be used for scheduling sync and async
communication/computation networks with various topologies and resource
allocation. However, I'm not sure whether this approach is also fit for
modern microservice architecture? In my opinion, this can be discussed and
cooked in Hama community as a sub-project until it's mature enough (CC'ing
general@i.a.o. I'll be happy to read more feedbacks from ASF incubator
community).

P.S., It seems you referred to incubation proposal template. There's no need
to add me as initial committer (I don't have much time to actively
contribute to your project). And, I recently quit Samsung Electronics and
joined to $200 billion sized O2O e-commerce company as a CTO.

-Original Message-
From: Sachin Ghai [mailto:sachin.g...@impetus.co.in]
Sent: Monday, February 27, 2017 5:16 PM
To: dev@hama.apache.org
Subject: Proposal for an Apache Hama sub-project

Hama Community,

I would like to propose a sub-project for Apache Hama and initiate
discussion around the proposal. The proposed sub-project named 'Scalar' is a
scalable orchestration, training and serving system for machine learning and
deep learning. Scalar would leverage Apache Hama to automate the distributed
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project
proposal template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data
analytics and deep learning modelling, deployment, serving with high
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user
to easily scale the functions of training a model, deploying a model and
serving the prediction from underlying machine learning or deep learning
framework. It is also the characteristic of its execution framework to
orchestrate heterogeneous workload graphs utilizing Apache Hama, Apache
Hadoop, Apache Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta
tested for one of the largest insurance organizations in a client specific
PoC. The motivation behind this work is to build a framework that provides
abstraction on heterogeneous data science frameworks and helps users
leverage them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in
the industry. As an application developer, it becomes a hard choice to
switch from one framework to another without rewriting the application.
Also, there is additional plumbing to be done to retrieve the prediction
results for each model in different frameworks. We aim to provide an
abstraction framework which can be used to seamlessly train and deploy the
model at scale on multiple frameworks like TensorFlow, Apache Horn or Caffe.
The abstraction further provides a unified layer for serving the prediction
in the most performant, scalable and efficient way for a multi-tenant
deployment. The key performance metrics will be reduction in training time,
lower error rate and lower latency time for serving models.
Scalar consists of a core engine which can be used to create flows described
in terms of state, sequences and algorithms. The engine invokes execution
context of Apache Hama to train and deploy models on target framework.
Apache Hama is used for a variety of functions including parameter tuning
and scheduling computations on a distributed cluster. A data object layer
provides access to data from heterogeneous sources like HDFS, local, S3 etc.
A REST API layer is utilized for serving the prediction functions to client
applications. A caching layer in the middle acts as a latency improver for
various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the
algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on
meritocracy. We will provide continuous efforts to build an environment that
supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and
companies such as enterprise services and product company and artificial
intelligence startup. There is a lot of interest in data science serving
systems and Artificial intelligence simplification systems. By bringing
Scalar into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni,
Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at
Samsung 

Proposal for an Apache Hama sub-project

2017-02-27 Thread Sachin Ghai
Hama Community,

I would like to propose a sub-project for Apache Hama and initiate discussion 
around the proposal. The proposed sub-project named 'Scalar' is a scalable 
orchestration, training and serving system for machine learning and deep 
learning. Scalar would leverage Apache Hama to automate the distributed 
training, model deployment and prediction serving.

More details about the proposal are listed below as per Apache project proposal 
template:
Abstract
Scalar is a general purpose framework for simplifying massive scale big data 
analytics and deep learning modelling, deployment, serving with high 
performance.
Proposal
It is a goal of Scalar to provide an abstraction framework which allows user to 
easily scale the functions of training a model, deploying a model and serving 
the prediction from underlying machine learning or deep learning framework. It 
is also the characteristic of its execution framework to orchestrate 
heterogeneous workload graphs utilizing Apache Hama, Apache Hadoop, Apache 
Spark and TensorFlow resources.
Background
The initial Scalar code was developed in 2016 and has been successfully beta 
tested for one of the largest insurance organizations in a client specific PoC. 
The motivation behind this work is to build a framework that provides 
abstraction on heterogeneous data science frameworks and helps users leverage 
them in the most performant way.
Rationale
There is a sudden deluge of machine learning and deep learning frameworks in 
the industry. As an application developer, it becomes a hard choice to switch 
from one framework to another without rewriting the application. Also, there is 
additional plumbing to be done to retrieve the prediction results for each 
model in different frameworks. We aim to provide an abstraction framework which 
can be used to seamlessly train and deploy the model at scale on multiple 
frameworks like TensorFlow, Apache Horn or Caffe. The abstraction further 
provides a unified layer for serving the prediction in the most performant, 
scalable and efficient way for a multi-tenant deployment. The key performance 
metrics will be reduction in training time, lower error rate and lower latency 
time for serving models.
Scalar consists of a core engine which can be used to create flows described in 
terms of state, sequences and algorithms. The engine invokes execution context 
of Apache Hama to train and deploy models on target framework. Apache Hama is 
used for a variety of functions including parameter tuning and scheduling 
computations on a distributed cluster. A data object layer provides access to 
data from heterogeneous sources like HDFS, local, S3 etc. A REST API layer is 
utilized for serving the prediction functions to client applications. A caching 
layer in the middle acts as a latency improver for various functions.
Initial Goals
Some current goals include:

  *   Build community.
  *   Provide general purpose API for machine learning and deep learning 
training, deployment and serving.
  *   Serve the predictions with low latency.
  *   Run massive workloads via Apache Hama on TensorFlow, Apache Spark and 
Caffe.
  *   Provide CPU and GPU support on-premise or on cloud to run the algorithms.
Current Status
Meritocracy
The core developers understand what it means to have a process based on 
meritocracy. We will provide continuous efforts to build an environment that 
supports this, encouraging community members to contribute.
Community
A small community has formed within the Apache Hama project community and 
companies such as enterprise services and product company and artificial 
intelligence startup. There is a lot of interest in data science serving 
systems and Artificial intelligence simplification systems. By bringing Scalar 
into Apache, we believe that the community will grow even bigger.
Core Developers
Edward J. Yoon, Sachin Ghai, Ishwardeep Singh, Rachna Gogia, Abhishek Soni, 
Nikunj Limbaseeya, Mayur Choubey
Known Risks
Orphaned Products
Apache Hama is already a core open source component being utilized at Samsung 
Electronics, and Scalar is already getting adopted by major enterprise 
organizations. There is no direct risk for Scalar project to be orphaned.
Inexperience with Open Source
All contributors have experience using and/or working on Apache open source 
projects.
Homogeneous Developers
The initial committers are from different organizations such as Impetus, Chalk 
Digital, and Samsung Electronics.
Reliance on Salaried Developers
Few will be working as full-time open source developer. Other developers will 
also start working on the project in their spare time.
Relationships with Other Apache Products

  *   Scalar is being built on top of Apache Hama
  *   Apache Spark is being used for machine learning.
  *   Apache Horn is being used for deep learning.
  *   The framework will run natively on Apache Hadoop and Apache Mesos.
An Excessive Fascination with the Apache Brand
Scalar itself will