Yes, confirmed that fixing it unfortunately causes trouble in Java 8. See https://issues.apache.org/jira/browse/SPARK-19287 for further discussion.
On Wed, Jan 18, 2017 at 9:00 PM Sean Owen <so...@cloudera.com> wrote: > Hm. Unless I am also totally missing or forgetting something, I think > you're right. The equivalent in PairRDDFunctions.scala operations on a > function from T to TraversableOnce[U] and a TraversableOnce is most like > java.util.Iterator. > > You can work around it by wrapping it in a faked IteratorIterable. > > I think this is fixable in the API by deprecating this method and adding a > new one that takes a FlatMapFunction. We'd have to triple-check in a test > that this doesn't cause an API compatibility problem with respect to Java 8 > lambdas, but if that's settled, I think this could be fixed without > breaking the API. > > On Wed, Jan 18, 2017 at 8:50 PM Asher Krim <ak...@hubspot.com> wrote: > > In Spark 2 + Java + RDD api, the use of iterables was replaced with > iterators. I just encountered an inconsistency in `flatMapValues` that may > be a bug: > > `flatMapValues` ( > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677) > takes a `FlatMapFunction` ( > https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/java/org/apache/spark/api/java/function/FlatMapFunction.java > ) > > The problem is that `FlatMapFunction` was changed to return an iterator, > but `rdd.flatMapValues` still expects an iterable. Am I using these > constructs correctly? Is there a workaround other than converting the > iterator to an iterable outside of the function? > > Thanks, > -- > Asher Krim > Senior Software Engineer > >