Hi All, Niketan, this feedback in much appreciated and I will continue to work on this. In the meantime, some of the other (offline) feedback I got for this included making DML variables accessible across DML cells. Towards that end, I've made some improvements to the Zeppelin-DML integration. There is also a convenient (albeit large ~2GB ) docker image to test this out with.
All the information is on the JIRA : https://issues.apache.org/jira/browse/SYSTEMML-542 It has screenshots, docker instructions and steps to recreate the dev environment to play with. These are the features (thus far): Launch a standalone DML cell which runs the DML interpreter locally (using %dml) - This has rudimentary features and will be developed if there is demand Launch a DML cell which runs on Spark (using %spark.dml) - Transfer data between Spark, PySpark, etc and DML Cells (as Dataframes) -- Read data in a Spark cell (as a DataFrame) and use it in a DML cell -- Write a DML matrix in a DML cell and read it as a DataFrame in a Spark Cell -- This is done using ZeppelinContext ( https://zeppelin.incubator.apache.org/docs/latest/interpreter/spark.html) - Transfer data between DML cells - scalar types (booleans, strings, floats, integers) and matrices -- Any variable defined in a cell can be used (read from/written to) in subsequent cells. -- This is very similar to how spark cells operate. Any feedback is greatly appreciated. Thanks, Nakul Jindal On Tue, Mar 8, 2016 at 10:30 AM, Niketan Pansare <npan...@us.ibm.com> wrote: > Hi Nakul, > > This is good work ! > > My 2 cents, we should add missing features (such as command-line > arguments), document the API for this POC, come up with examples for > existing algorithms with open-source datasets and put them in > https://github.com/apache/incubator-systemml/tree/master/samples/zeppelin-notebooks > > This way, people are encouraged to try out (and may be even modify > on-the-fly the) existing DML algorithms with specific datasets. Borrowing > an example from > http://scikit-learn.org/stable/tutorial/basic/tutorial.html: > >>> from sklearn import datasets > >>> iris = datasets.load_iris() > >>> digits = datasets.load_digits() > *>>> **from* *sklearn* *import* svm > *>>> *clf = svm.SVC(gamma=0.001, C=100.) > *>>> *clf.fit(digits.data[:-1], digits.target[:-1]) > *>>> *clf.predict(digits.data[-1:]) > > We can then put a link to the given example in > http://apache.github.io/incubator-systemml/algorithms-classification.html#support-vector-machines > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > [image: Inactive hide details for Nakul Jindal ---03/06/2016 07:22:10 > PM---Hi, I've put together a proof of concept for having DML be a]Nakul > Jindal ---03/06/2016 07:22:10 PM---Hi, I've put together a proof of concept > for having DML be a first class > > From: Nakul Jindal <naku...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 03/06/2016 07:22 PM > Subject: DML in Zeppelin > ------------------------------ > > > > Hi, > > I've put together a proof of concept for having DML be a first class > citizen in Apache Zeppelin. > > Brief intro to Zeppelin - > Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive > and other projects. It can be thought of as a REPL in a browser. > Small units of code are put into "cell"s. These individual "cells" can then > be run interactively. Of course there is support for queue-ing up and > running cells in parallel. > Cells are contained in notebooks. Notebooks can be exported and are > persistent between sessions. > > One can type code in (Scala) Spark in cell 1 and save a data frame object. > He can then type code in PySpark in cell 2 and access the previously saved > data frame. > This is done by the Zeppelin runtime system by injecting a special variable > called "z" into the Spark and PySpark environments in Zeppelin. This "z" is > an object of type ZeppelinContext and makes available a "get" and a "put" > method. > DML in Spark mode can now access this feature as well. > > In this POC, DML can operate in 2 modes - standalone and spark. > > Screenshots of it working: > http://imgur.com/a/m7ASx > > GIF of the screenshots: > http://i.imgur.com/NttMuKC.gifv > > Instructions: > https://gist.github.com/anonymous/6ab8c569b2360232e252 > > JIRA: > https://issues.apache.org/jira/browse/SYSTEMML-542 > > > Nakul Jindal > > > >