[ https://issues.apache.org/jira/browse/SPARK-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-7025: ----------------------------------- Target Version/s: 1.5.0 (was: 1.4.0) > Create a Java-friendly input source API > --------------------------------------- > > Key: SPARK-7025 > URL: https://issues.apache.org/jira/browse/SPARK-7025 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Reynold Xin > Assignee: Reynold Xin > > The goal of this ticket is to create a simple input source API that we can > maintain and support long term. > Spark currently has two de facto input source API: > 1. RDD > 2. Hadoop MapReduce InputFormat > Neither of the above is ideal: > 1. RDD: It is hard for Java developers to implement RDD, given the implicit > class tags. In addition, the RDD API depends on Scala's runtime library, > which does not preserve binary compatibility across Scala versions. If a > developer chooses Java to implement an input source, it would be great if > that input source can be binary compatible in years to come. > 2. Hadoop InputFormat: The Hadoop InputFormat API is overly restrictive. For > example, it forces key-value semantics, and does not support running > arbitrary code on the driver side (an example of why this is useful is > broadcast). In addition, it is somewhat awkward to tell developers that in > order to implement an input source for Spark, they should learn the Hadoop > MapReduce API first. > So here's the proposal: an InputSource is described by: > * an array of InputPartition that specifies the data partitioning > * a RecordReader that specifies how data on each partition can be read > This interface would be similar to Hadoop's InputFormat, except that there is > no explicit key/value separation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org