R Interpreter - PR 702

Jeff Steinmetz Tue, 08 Mar 2016 11:53:21 -0800

RE 702:
I wanted to respond to Eric’s discussion from December 30.

I finally had some time to put aside a good chunk of dedicated, uninterrupted 
time.
This means I had a chance to “really” dig into this with a Data Science R 
developer hat on.
I also thought about this from a DevOps point of view (deploying in an EC2 
cluster, standalone, locally, VM).
I tested it with a spark installation outside of the zeppelin build - as if it 
was running on a cluster or standalone install.


I also had a chance to dig under the hood a bit, and explore what the 
Java/Scala code in PR 702 is doing.

I like the simplicity of this PR (the source code and approach).  

Works as expected, all graphic works, interactive charts works.

I also see your point about Rendering the text result vs TABLE plot when the R 
interpreter result is a data frame.
To confirm - the approach is to use  %sql to display it in a native Zeppelin 
visualization.  

Your approach makes sense, since this in line with how this works in other 
Zeppelin work flows.
I suppose you could add an R interpreter function, such as: 
z.R.showDFAsTable(fooDF) if we wanted to force the data frame into a %table 
without having to jump to %sql (perhaps a nice addition in this or a future PR).

It’s GREAT that %r print('%html') works with the Zeppelin display system!  (as 
well as the other display system methods)

Regarding rscala jar.  You have a profile that will allow us to sync up the 
version rscala, so that makes sense as well.  
This too worked as expected.  I specifically installed rscala (as you describe 
in your docs) in the VM with:

curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz 
-o /tmp/rscala_1.0.6.tar.gz
R CMD INSTALL /tmp/rscala_1.0.6.tar.gz


Installing rscala outside of the Zeppelin dependencies does seem to keep this 
PR simpler, and reduces the licensing overhead required to get this PR through 
(based on comments I see from others)

I would need to add the two rscala install lines above to PR#751 (I will add 
this today)
https://github.com/apache/incubator-zeppelin/pull/751


Regarding the Interpreters.   Just having %r as the our first interpreter 
keyword makes sense.   Loading knitr within the interpreter to enable rendering 
(versus having a %knitr interpreter specifically) seems to keep things simple.

In summary - Looks good since everything in your sample R notebook (as well as 
a few other tests I tried) worked for me using the VM script in PR#751.
The documentation also facilitated a smooth installation and allowed me to 
create a repeatable script, that when paired with the VM worked as expected.

----
Jeff Steinmetz
Principal Architect
Akili Interactive
www.akiliinteractive.com <http://www.akiliinteractive.com/>







>From
>    Eric Charles <[email protected]>
>   
>   
>    Subject
>    [DISCUSS] PR #208 - R Interpreter for Zeppelin
>   
>   
>    Date
>Wed, 30 Dec 2015 14:04:33 GMT


>Hi,
>
>I had a look at https://github.com/apache/incubator-zeppelin/pull/208 
>(and related Github repo https://github.com/elbamos/Zeppelin-With-R [1])
>
>Here are a few topics for discussion based on my experience developing 
>https://github.com/datalayer/zeppelin-R [2].
>
>1. rscala jar not in Maven Repository
>
>[1] copies the source (scala and R) code from rscala repo and 
>changes/extends/repackages it a bit. [2] declares the jar as system 
>scoped library. I recently had incompatibly issues between the 1.0.8 
>(the one you get since 2015-12-10 when you install rscala on your R 
>environment) and the 1.0.6 jar I am using part of the zeppelin-R build. 
>To avoid such issues, why not the user choosing the version via a 
>property at build time to fit the version he runs on its host? This will 
>also allow to benefit from the next rscala releases which fix bugs, 
>bring not features... This also means we don't have to copy the rscala 
>code in Zeppelin tree.
>
>2. Interpreters
>
>[1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are 
>implemented in their own module apart from the Spark one. To be aligned 
>the existing pyspark implementation, why not integrating the R code into 
>the Spark one? Any reason to keep 2 versions which does basically the 
>same? The unique magic keyword would then be %spark.r
>
>3. Rendering TABLE plot when interpreter result is a dataframe
>
>This may be confusing. What if I display a plot and simply want to print 
>the first 10 rows at the end of my code? To keep the same behavior as 
>the other interpreters, we could make this feature optional (disabled by 
>default, enabled via property).
>
>
>Thx, Eric
>
>
>

R Interpreter - PR 702

Reply via email to