Re: R and SparkR Support

Eric Charles Tue, 23 Feb 2016 11:08:33 -0800


On 23/02/16 19:52, Amos B. Elberg wrote:

Eric, they're not equivalent. 208 continues to have functionality 702 doesn't, 
including the display system.


I'm not going to tell you what you're doing wrong in your implementation and 
"test" of 208, because the users don't seem to have the same confusion, and 
I've essentially been guiding your development process by pointing out the issues.

All three of the issues you raise were addressed already in other threads:

1. The proposed approach to rscala actually introduces maintenance issues that 
have already broken 702. 702 was then revised to work around that, by 
distributing part of rscala in binary form. But the workaround doesn't deal 
with the issue of R users updating their own installations, and it eliminates 
the purported benefit of the approach.

Using binary form with a specific version at build time is the classicalway to deploy on machines. Upgrading machines with a new rscala libraryimplies rebuilding and redeploying.

This flexibility is only possible with binaries and not with forkedfixed source code. With 702, you can choose to build with scala 2.xx andrscala 1.0.8 or the version you want to align with the library availableon your machines.

2. This is purely cosmetic. 208 is outside the spark module because it made 
development, testing and merging cleaner.

Sure, this is cosmetic, but I have tried to stick to the existingpyspark implementation to avoid additional maven modules. Btw, havingtwo magic keywords as 208 offers is also something I have avoided toalign with current practices and make it simple for the end user.


3. 208 has supported the HTML, TABLE and IMG display system all along, in an 
R-consistent manner. 702 originally did not support any of it. After I pointed 
out the gap and users complained, 702 was revised to implement it partially. 
702 still does not. That's why the user questions about this all get asked on 
702 - the people using 208 don't need to ask about it, because it works as 
expected.

I quickly pulled and tested today your branch but running print("%html<h1>hello</h1>") didn't work. Will try again tomorrow.

On Feb 23, 2016, at 1:20 PM, Eric Charles <e...@apache.org> wrote:

It would make no sense merging both.

 From an end-user perspective, I guess both are equivalent, although with the 
last commit I made, the Zeppelin Display system is supported in 702 (I had no 
luck when testing this functionality with 208). As I said, feel free to test 
both and send feature requests.

 From a developer perspective, I will reiterate the points I sent on [1] which 
are addressed in 702 (these points make sense to me but didn't receive echo so 
far - would like to get feedback on these):

1.- Use rscala jar instead of forking -> allows to support the platform version 
(scala version...) and benefit from the rscala project new versions with patches 
without having to maintain in the zeppelin source tree fork.

2.- Just like Python, develop R in the Spark module

3.- Support the same behavior asthe rest (no TABLE when output is a dataframe, 
support the HTML, TABLE and IMG display system, support the Dynamic Form 
system).

I still have the Dynamic Form system operational.

[1] 
http://mail-archives.apache.org/mod_mbox/incubator-zeppelin-dev/201512.mbox/%3C5683E471.9010001%40apache.org%3E

On 23/02/16 19:09, Jeff Steinmetz wrote:
Thank you Amos Elberg & Eric Charles:
Is the goal of the community to merge both 208 and 702 at some point as two 
“different” R interpreters?

One that is
   %r
And another that is
   %spark.r

Still trying to wrap my head around the difference.

On 2/23/16, 9:34 AM, "Amos B. Elberg" <amos.elb...@gmail.com> wrote:

Jeff - 702 isn't a fork, it's an alternative based on 208 that has a subset of 
208's features.  208 is the superset. 208 is also what the community is now 
attempting to integrate.

R does support serialization of functions.

208 does support passing a spark table back and forth between R and scala. 
Passing a data.frame through the Zeppelin context will fail in spark up to 1.5. 
It may now be working for some data frames in 1.6.

There are examples that do all these things in the documentation for 208 on my 
repo at github.com/elbamos/Zeppelin-With-R

On Feb 23, 2016, at 12:03 PM, Jeff Steinmetz <jeffrey.steinm...@gmail.com> 
wrote:

Hello zeppelin dev group,

Regarding the R Interpreter Pull requests 208 and 702.  I am trying to figure 
out if the functionality between these are overlapping, or one supports 
something different than the other.  Is 702 a super set of 208 (702 is a fork 
of 208)?

Can you pass the reference of a distributed (parallelized) dataframe built in %spark 
(scala) to the R interpreter?   Similar to z.put(“myDF", myDF)?

Similarly, since R doesn’t support serialization of functions (unless you use 
something from the SparkR library) is there an example of collecting the 
parallel DF to a local DF (which I realize it means the dataset needs to fit in 
local memory on the zeppelin server).

I can to dig into this a bit and help out where appropriate, however its 
unclear which PR to focus my efforts on.

Best,
Jeff Steinmetz
Principal Architect
Akili Interactive Labs

On 2/23/16, 8:01 AM, "elbamos" <g...@git.apache.org> wrote:

Github user elbamos commented on the pull request:

   https://github.com/apache/incubator-zeppelin/pull/702#issuecomment-187764059

   @btiernay support for that has been in 208 all along...

On Feb 23, 2016, at 9:27 AM, Bob Tiernay <notificati...@github.com> wrote:

@echarles This is great! Thanks for all your hard work. Very much appreciated!

â•‰
Reply to this email directly or view it on GitHub.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: R and SparkR Support

Reply via email to