[
https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated SPARK-19287:
------------------------------
Docs Text: JavaPairRDD.flatMapValues() now requires a FlatMapFunction as an
argument. This means this function now must return an Iterator, not Iterable.
This corrects a long-standing inconsistency between the Scala and Java API, and
allows the caller to supply merely an Iterator, not a full Iterable. Existing
functions passed to this method can simply invoke ".iterator()" on their
existing return value to comply with the new signature.
> JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
> ----------------------------------------------------------------------------
>
> Key: SPARK-19287
> URL: https://issues.apache.org/jira/browse/SPARK-19287
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 2.1.1
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Minor
> Labels: release-notes
> Fix For: 3.0.0
>
>
> SPARK-3369 corrected an old oversight in the Java API, wherein
> {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As
> reported by [~akrim], it seems that this same type of problem was overlooked
> also in {{JavaPairRDD}}
> (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677
> ):
> {code}
> def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K,
> U] =
> {code}
> As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on
> {{TraversableOnce}}, this should really take a function that returns an
> {{Iterator}} -- really, {{FlatMapFunction}}.
> We can easily add an overload and deprecate the existing method.
> {code}
> def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
> {code}
> This is source- and binary-backwards-compatible, in Java 7. It's
> binary-backwards-compatible in Java 8, but not source-compatible. The
> following natural usage with Java 8 lambdas becomes ambiguous and won't
> compile -- Java won't figure out which to implement even based on the return
> type unfortunately:
> {code}
> JavaPairRDD<Integer, String> pairRDD = ...
> JavaPairRDD<Integer, Integer> mappedRDD =
> pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
> {code}
> It can be resolved by explicitly casting the lambda.
> We can at least document this. One day in Spark 3.x this can just be changed
> outright.
> It's conceivable to resolve this by making the new method called
> "flatMapValues2" or something ugly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]