Hi,

I've put together a proof of concept for having DML be a first class
citizen in Apache Zeppelin.

Brief intro to Zeppelin -
Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive
and other projects. It can be thought of as a REPL in a browser.
Small units of code are put into "cell"s. These individual "cells" can then
be run interactively. Of course there is support for queue-ing up and
running cells in parallel.
Cells are contained in notebooks. Notebooks can be exported and are
persistent between sessions.

One can type code in (Scala) Spark in cell 1 and save a data frame object.
He can then type code in PySpark in cell 2 and access the previously saved
data frame.
This is done by the Zeppelin runtime system by injecting a special variable
called "z" into the Spark and PySpark environments in Zeppelin. This "z" is
an object of type ZeppelinContext and makes available a "get" and a "put"
method.
DML in Spark mode can now access this feature as well.

In this POC, DML can operate in 2 modes - standalone and spark.

Screenshots of it working:
http://imgur.com/a/m7ASx

GIF of the screenshots:
http://i.imgur.com/NttMuKC.gifv

Instructions:
https://gist.github.com/anonymous/6ab8c569b2360232e252

JIRA:
https://issues.apache.org/jira/browse/SYSTEMML-542


Nakul Jindal

Reply via email to