[GitHub] incubator-zeppelin pull request: R Interpreter for Zeppelin

Leemoonsoo Mon, 16 Nov 2015 04:48:55 -0800

Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/208#issuecomment-157017167
  
    Sometimes people can argue then I see the passion.
    But when people don't respect each other, it easily become blaming and 
fighting. 
    Let's try to not have blaming and fighting, but productive argue.
    
    
    @elbamos 
    
    #### 1. The R-Scala Interface
    
    > I've explained several times that the proposal to use SparkR 
bi-directionally doesn't work. I don't feel that I have more to add about that.
    > 
    > I will try to reduce the size of the code that originated in rscala.
    > 
    > If this is not going to be acceptable to the PCCM, please tell me now.
    
    I described multiple times about PySparkInterpreter made successful 
bi-directional invocation with one way connection provided by py4j, and the 
some idea can applied to SparkR.
    
    It's up to you taking the idea or not. But i just want to mention that:
    
      * Consider consistent way of communicating from Zeppelin to 
PySparkInterpreter. That'll later help new contributor understand both 
PySParkInterpreter, RInterpreter in the same way, together. So can expect more 
contributors.
      * Less chances of problem, it only requires single socket connection 
between R and JVM.
      * Code base will be reduced. Easier maintenance as well as faster code 
review.
    
    
    #### 2. KnitR
    
    > What you're proposing is that users enter the same boilerplate, which 
they would have to figure out for themselves, every time they want to use knitr.
    > 
    > Knitr and the repl are fundamentally two different ways for users to 
interact with R. They have very different behaviors in terms of error reporting 
and handling visualizations.
    > 
    > If you don't want to trust me about this, then I suggest we ask some 
other R users what makes the most sense.
    > 
    > If this is not going to be acceptable to the PCCM, please tell me now.
    
    
    I explained why i'm thinking it can be implemented as a function 
`z.knitr(input)`.
    You said you're R user. So i'm asking you, can you please provide some use 
cases or examples why KnitR need to implemented as a separate Interpreter, not 
a function? You can show how different KnitR and Repl in terms of 
    
      * ways for users to interact
      * error reporting
      * handling visualizations
    
    
    #### 3. KnitR GPL License
    
    > KnitR is an optional external dependency. This is not a licensing problem.
    > 
    > It is also not a licensing problem to interact with GPL code that isn't 
supplied with Zeppelin.
    > 
    > For example, R itself is GPL code. So it Zeppelin cannot interact with 
external GPL code, then there cannot be an R interpreter in Zeppelin at all.
    > 
    > Considering that Spark interacts with R, I think this issue is closed.
    
    I'm not sure how Spark handles license issue with R. My guess is, generally 
GPL licensed compiler generated output is not applied GPL license. So it could 
be okay to SparkR give input to R and get output from it and use.
    
    KnitR is i think little bit different. It's running as a library. It is 
**dynamically linked** with Zeppelin through R.
    
    ```
    Zeppelin - R - KnitR
    ```
    
    If you think it's okay, that means not only KnitR but any other GPL 
licensed library can used inside of Zeppelin in the same way.
    
    
    #### 4. License any copyright
    
    > License and Copyright problems are one of the highest priority item in 
Zeppelin project
    > 
    > Huh? What we were talking about is who gets identified in code as the 
author. That is obviously not a license/copyright issue, its an issue of credit.
    > 
    > You said that it is discouraged to identify anybody as the author in 
Apache projects.
    > 
    > However, the current code does identify authors, with you identified as 
the principal author.
    > 
    > So, at this point I'm not sure what you're referring to?
    
    The current code with Author tag (at least my name in Author tag) is 
written before the copyright is transferred to ASF from NFLabs, by signing 
https://www.apache.org/licenses/software-grant.txt. 
    
    And thanks for pointing me that there're author tags remains, i'm going to 
remove that.
    So please consider your one, too.
    
    
    #### 6. Travis Builds
    
    > Actually the error in the travis logs begins with this:
    > 
    > 15/09/23 03:12:22 INFO HiveMetaStore: No user is added in admin role, 
since config is empty
    > 15/09/23 03:12:24 WARN SparkInterpreter: Can't create HiveContext. 
Fallback to SQLContext
    > java.lang.reflect.InvocationTargetException
    > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    > at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    > at > > 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45>
 > )
    > at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    > That's not coming from rzeppelin. That's coming from the SparkInterpreter 
when rzeppelin asks it to initiate a spark backend.
    > 
    > This is what I mean about issues in the spark-zeppelin interface.
    
    You know what `java.lang.OutOfMemoryError: PermGen space` means.
    Then you'll able to fix it no matter where it is coming from. The first 
thing you can try is increasing PermGem space for the testing.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: R Interpreter for Zeppelin

Reply via email to