[ 
https://issues.apache.org/jira/browse/SPARK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359166#comment-14359166
 ] 

Patrick Wendell commented on SPARK-5654:
----------------------------------------

I see the decision here as somewhat orthogonal to vendors and vendor packaging. 
Vendors can chose whether to package this component or not, and some may leave 
it out until it gets more mature. Of course, they are more encouraged/pressured 
to package things that end up inside the project itself, but that could be used 
to justify merging all kinds of random stuff into Spark, so I don't think it's 
a sufficient justification.

The main argument as I said before is just that non-JVM language API's are 
really just not possible to maintain outside of the project, because it's not 
building on any even remotely "public" API. Imagine if we tried to have PySpark 
as it's own project, it is so tightly coupled that it wouldn't work.

I have argued in the past for things to existing outside the project when they 
can, and that I still promote that strongly.

> Integrate SparkR into Apache Spark
> ----------------------------------
>
>                 Key: SPARK-5654
>                 URL: https://issues.apache.org/jira/browse/SPARK-5654
>             Project: Spark
>          Issue Type: New Feature
>          Components: Project Infra
>            Reporter: Shivaram Venkataraman
>
> The SparkR project [1] provides a light-weight frontend to launch Spark jobs 
> from R. The project was started at the AMPLab around a year ago and has been 
> incubated as its own project to make sure it can be easily merged into 
> upstream Spark, i.e. not introduce any external dependencies etc. SparkR’s 
> goals are similar to PySpark and shares a similar design pattern as described 
> in our meetup talk[2], Spark Summit presentation[3].
> Integrating SparkR into the Apache project will enable R users to use Spark 
> out of the box and given R’s large user base, it will help the Spark project 
> reach more users.  Additionally, work in progress features like providing R 
> integration with ML Pipelines and Dataframes can be better achieved by 
> development in a unified code base.
> SparkR is available under the Apache 2.0 License and does not have any 
> external dependencies other than requiring users to have R and Java installed 
> on their machines.  SparkR’s developers come from many organizations 
> including UC Berkeley, Alteryx, Intel and we will support future development, 
> maintenance after the integration.
> [1] https://github.com/amplab-extras/SparkR-pkg
> [2] http://files.meetup.com/3138542/SparkR-meetup.pdf
> [3] http://spark-summit.org/2014/talk/sparkr-interactive-r-programs-at-scale-2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to