[
https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291189#comment-14291189
]
Sandy Ryza commented on SPARK-3622:
-----------------------------------
Spark already has transformations that take a single RDD and output multiple
RDDs. For example, randomSplit.
If the goal is to support computing multiple paths that inherit from the same
parent RDD in a single job, the new API needed is probably something that
batches multiple actions into the same job. I'm going to close this as not an
issue, and we can open a JIRA to add this API as a subtask of SPARK-2688 when
we have a design for it.
> Provide a custom transformation that can output multiple RDDs
> -------------------------------------------------------------
>
> Key: SPARK-3622
> URL: https://issues.apache.org/jira/browse/SPARK-3622
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.1.0
> Reporter: Xuefu Zhang
>
> All existing transformations return just one RDD at most, even for those
> which takes user-supplied functions such as mapPartitions() . However,
> sometimes a user provided function may need to output multiple RDDs. For
> instance, a filter function that divides the input RDD into serveral RDDs.
> While it's possible to get multiple RDDs by transforming the same RDD
> multiple times, it may be more efficient to do this concurrently in one shot.
> Especially user's existing function is already generating different data sets.
> This the case in Hive on Spark, where Hive's map function and reduce function
> can output different data sets to be consumed by subsequent stages.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]