[
https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-3369.
------------------------------
Resolution: Fixed
Fix Version/s: 2.0.0
Issue resolved by pull request 10413
[https://github.com/apache/spark/pull/10413]
> Java mapPartitions Iterator->Iterable is inconsistent with Scala's
> Iterator->Iterator
> -------------------------------------------------------------------------------------
>
> Key: SPARK-3369
> URL: https://issues.apache.org/jira/browse/SPARK-3369
> Project: Spark
> Issue Type: Sub-task
> Components: Java API
> Affects Versions: 1.0.2, 1.2.1
> Reporter: Sean Owen
> Assignee: Sean Owen
> Labels: breaking_change, releasenotes
> Fix For: 2.0.0
>
>
> {{mapPartitions}} in the Scala RDD API takes a function that transforms an
> {{Iterator}} to an {{Iterator}}:
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
> In the Java RDD API, the equivalent is a FlatMapFunction, which operates on
> an {{Iterator}} but is requires to return an {{Iterable}}, which is a
> stronger condition and appears inconsistent. It's a problematic inconsistent
> though because this seems to require copying all of the input into memory in
> order to create an object that can be iterated many times, since the input
> does not afford this itself.
> Similarity for other {{mapPartitions*}} methods and other
> {{*FlatMapFunctions}}s in Java.
> (Is there a reason for this difference that I'm overlooking?)
> If I'm right that this was inadvertent inconsistency, then the big issue here
> is that of course this is part of a public API. Workarounds I can think of:
> Promise that Spark will only call {{iterator()}} once, so implementors can
> use a hacky {{IteratorIterable}} that returns the same {{Iterator}}.
> Or, make a series of methods accepting a {{FlatMapFunction2}}, etc. with the
> desired signature, and deprecate existing ones.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]