Xuefu Zhang created SPARK-3622:
----------------------------------

             Summary: Provide a custom transformation that can output multiple 
RDDs
                 Key: SPARK-3622
                 URL: https://issues.apache.org/jira/browse/SPARK-3622
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.1.0
            Reporter: Xuefu Zhang


All existing transformations return just one RDD at most, even for those which 
takes user-supplied functions such as mapPartitions() . However, sometimes a 
user provided function may need to output multiple RDDs. For instance, a filter 
function that divides the input RDD into serveral RDDs. While it's possible to 
get multiple RDDs by transforming the same RDD multiple times, it may be more 
efficient to do this concurrently in one shot. Especially user's existing 
function is already generating different data sets.

This the case in Hive on Spark, where Hive's map function and reduce function 
can output different data sets to be consumed by subsequent stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to