Hi Nakul, This is good work !
My 2 cents, we should add missing features (such as command-line arguments), document the API for this POC, come up with examples for existing algorithms with open-source datasets and put them in https://github.com/apache/incubator-systemml/tree/master/samples/zeppelin-notebooks This way, people are encouraged to try out (and may be even modify on-the-fly the) existing DML algorithms with specific datasets. Borrowing an example from http://scikit-learn.org/stable/tutorial/basic/tutorial.html : >>> from sklearn import datasets >>> iris = datasets.load_iris() >>> digits = datasets.load_digits() >>> from sklearn import svm >>> clf = svm.SVC(gamma=0.001, C=100.) >>> clf.fit(digits.data[:-1], digits.target[:-1]) >>> clf.predict(digits.data[-1:]) We can then put a link to the given example in http://apache.github.io/incubator-systemml/algorithms-classification.html#support-vector-machines Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Nakul Jindal <[email protected]> To: [email protected] Date: 03/06/2016 07:22 PM Subject: DML in Zeppelin Hi, I've put together a proof of concept for having DML be a first class citizen in Apache Zeppelin. Brief intro to Zeppelin - Zeppelin is a "notebook" interface to interact with Spark, Cassandra, Hive and other projects. It can be thought of as a REPL in a browser. Small units of code are put into "cell"s. These individual "cells" can then be run interactively. Of course there is support for queue-ing up and running cells in parallel. Cells are contained in notebooks. Notebooks can be exported and are persistent between sessions. One can type code in (Scala) Spark in cell 1 and save a data frame object. He can then type code in PySpark in cell 2 and access the previously saved data frame. This is done by the Zeppelin runtime system by injecting a special variable called "z" into the Spark and PySpark environments in Zeppelin. This "z" is an object of type ZeppelinContext and makes available a "get" and a "put" method. DML in Spark mode can now access this feature as well. In this POC, DML can operate in 2 modes - standalone and spark. Screenshots of it working: http://imgur.com/a/m7ASx GIF of the screenshots: http://i.imgur.com/NttMuKC.gifv Instructions: https://gist.github.com/anonymous/6ab8c569b2360232e252 JIRA: https://issues.apache.org/jira/browse/SYSTEMML-542 Nakul Jindal
