Shivaram Venkataraman created SPARK-7230:
--------------------------------------------

             Summary: Make RDD API private in SparkR for Spark 1.4
                 Key: SPARK-7230
                 URL: https://issues.apache.org/jira/browse/SPARK-7230
             Project: Spark
          Issue Type: Sub-task
          Components: SparkR
    Affects Versions: 1.4.0
            Reporter: Shivaram Venkataraman
            Assignee: Shivaram Venkataraman
            Priority: Critical


This ticket proposes making the RDD API in SparkR private for the 1.4 release. 
The motivation for doing so are discussed in a larger design document aimed at 
a more top-down design of the SparkR APIs. A first cut that discusses 
motivation and proposed changes can be found at http://goo.gl/GLHKZI

The main points in that document that relate to this ticket are:
- The RDD API requires knowledge of the distributed system and is pretty low 
level. This is not very suitable for a number of R users who are used to more 
high-level packages that work out of the box.
- The RDD implementation in SparkR is not fully robust right now: we are 
missing features like spilling for aggregation, handling partitions which don't 
fit in memory etc. There are further limitations like lack of hashCode for 
non-native types etc. which might affect user experience.

The only change we will make for now is to not export the RDD functions as 
public methods in the SparkR package and I will create another ticket for 
discussing more details public API for 1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to