[ 
https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183301#comment-14183301
 ] 

Juliet Hougland commented on SPARK-3369:
----------------------------------------

The guaruntee of semantic versioning is that all major versions will be binary 
compatible. If we were to change the method to return a different type, we 
could no longer run programs written against previous versions on the current 
version, which would require us calling the version this change appears in 
Spark 2.0. The general rule is that you can add to an API and continue to 
compatible, but you can not remove.

I agree that expanding that API with methods that accept a bunch of 
FlatMapFunction2s would be ugly. I think the up side is that it is incredibly 
transparent to end users. I like that is allows an explicit deprecation and 
suggests and immediate alternative.

> Java mapPartitions Iterator->Iterable is inconsistent with Scala's 
> Iterator->Iterator
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-3369
>                 URL: https://issues.apache.org/jira/browse/SPARK-3369
>             Project: Spark
>          Issue Type: Improvement
>          Components: Java API
>    Affects Versions: 1.0.2
>            Reporter: Sean Owen
>            Priority: Critical
>              Labels: breaking_change
>         Attachments: FlatMapIterator.patch
>
>
> {{mapPartitions}} in the Scala RDD API takes a function that transforms an 
> {{Iterator}} to an {{Iterator}}: 
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
> In the Java RDD API, the equivalent is a FlatMapFunction, which operates on 
> an {{Iterator}} but is requires to return an {{Iterable}}, which is a 
> stronger condition and appears inconsistent. It's a problematic inconsistent 
> though because this seems to require copying all of the input into memory in 
> order to create an object that can be iterated many times, since the input 
> does not afford this itself.
> Similarity for other {{mapPartitions*}} methods and other 
> {{*FlatMapFunctions}}s in Java.
> (Is there a reason for this difference that I'm overlooking?)
> If I'm right that this was inadvertent inconsistency, then the big issue here 
> is that of course this is part of a public API. Workarounds I can think of:
> Promise that Spark will only call {{iterator()}} once, so implementors can 
> use a hacky {{IteratorIterable}} that returns the same {{Iterator}}.
> Or, make a series of methods accepting a {{FlatMapFunction2}}, etc. with the 
> desired signature, and deprecate existing ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to