RE 702:
I wanted to respond to Eric’s discussion from December 30.
I finally had some time to put aside a good chunk of dedicated, uninterrupted
time.
This means I had a chance to “really” dig into this with a Data Science R
developer hat on.
I also thought about this from a DevOps point of view (deploying in an EC2
cluster, standalone, locally, VM).
I tested it with a spark installation outside of the zeppelin build - as if it
was running on a cluster or standalone install.
I also had a chance to dig under the hood a bit, and explore what the
Java/Scala code in PR 702 is doing.
I like the simplicity of this PR (the source code and approach).
Works as expected, all graphic works, interactive charts works.
I also see your point about Rendering the text result vs TABLE plot when the R
interpreter result is a data frame.
To confirm - the approach is to use %sql to display it in a native Zeppelin
visualization.
Your approach makes sense, since this in line with how this works in other
Zeppelin work flows.
I suppose you could add an R interpreter function, such as:
z.R.showDFAsTable(fooDF) if we wanted to force the data frame into a %table
without having to jump to %sql (perhaps a nice addition in this or a future PR).
It’s GREAT that %r print('%html') works with the Zeppelin display system! (as
well as the other display system methods)
Regarding rscala jar. You have a profile that will allow us to sync up the
version rscala, so that makes sense as well.
This too worked as expected. I specifically installed rscala (as you describe
in your docs) in the VM with:
curl https://cran.r-project.org/src/contrib/Archive/rscala/rscala_1.0.6.tar.gz
-o /tmp/rscala_1.0.6.tar.gz
R CMD INSTALL /tmp/rscala_1.0.6.tar.gz
Installing rscala outside of the Zeppelin dependencies does seem to keep this
PR simpler, and reduces the licensing overhead required to get this PR through
(based on comments I see from others)
I would need to add the two rscala install lines above to PR#751 (I will add
this today)
https://github.com/apache/incubator-zeppelin/pull/751
Regarding the Interpreters. Just having %r as the our first interpreter
keyword makes sense. Loading knitr within the interpreter to enable rendering
(versus having a %knitr interpreter specifically) seems to keep things simple.
In summary - Looks good since everything in your sample R notebook (as well as
a few other tests I tried) worked for me using the VM script in PR#751.
The documentation also facilitated a smooth installation and allowed me to
create a repeatable script, that when paired with the VM worked as expected.
----
Jeff Steinmetz
Principal Architect
Akili Interactive
www.akiliinteractive.com <http://www.akiliinteractive.com/>
>From
> Eric Charles <[email protected]>
>
>
> Subject
> [DISCUSS] PR #208 - R Interpreter for Zeppelin
>
>
> Date
>Wed, 30 Dec 2015 14:04:33 GMT
>Hi,
>
>I had a look at https://github.com/apache/incubator-zeppelin/pull/208
>(and related Github repo https://github.com/elbamos/Zeppelin-With-R [1])
>
>Here are a few topics for discussion based on my experience developing
>https://github.com/datalayer/zeppelin-R [2].
>
>1. rscala jar not in Maven Repository
>
>[1] copies the source (scala and R) code from rscala repo and
>changes/extends/repackages it a bit. [2] declares the jar as system
>scoped library. I recently had incompatibly issues between the 1.0.8
>(the one you get since 2015-12-10 when you install rscala on your R
>environment) and the 1.0.6 jar I am using part of the zeppelin-R build.
>To avoid such issues, why not the user choosing the version via a
>property at build time to fit the version he runs on its host? This will
>also allow to benefit from the next rscala releases which fix bugs,
>bring not features... This also means we don't have to copy the rscala
>code in Zeppelin tree.
>
>2. Interpreters
>
>[1] proposes 2 interpreters %sparkr.r and %sparkr.knitr which are
>implemented in their own module apart from the Spark one. To be aligned
>the existing pyspark implementation, why not integrating the R code into
>the Spark one? Any reason to keep 2 versions which does basically the
>same? The unique magic keyword would then be %spark.r
>
>3. Rendering TABLE plot when interpreter result is a dataframe
>
>This may be confusing. What if I display a plot and simply want to print
>the first 10 rows at the end of my code? To keep the same behavior as
>the other interpreters, we could make this feature optional (disabled by
>default, enabled via property).
>
>
>Thx, Eric
>
>
>