[
https://issues.apache.org/jira/browse/SPARK-27683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837967#comment-16837967
]
Sean Owen commented on SPARK-27683:
-----------------------------------
There are many usages of TraversableOnce, though many are in internal packages
and classes, which isn't so urgent to address. The concern are public APIs, and
the main one is flatMap / flatMapValues / flatMapGroups. These accept a
function that returns a TraversableOnce.
That's nice as TraversableOnce is a supertype of Iterable and Iterator, so one
can return either in a flatMap. In Scala 2.13 IterableOnce will play that role
but it isn't available in 2.12. This makes it hard to create an API method that
works in both, and cuts off the possibility, I think, of deprecating the
current method while adding the new one.
IterableOnce will have basically two subclasses, Iterable and Iterator. These
exist now. We could change to support both of those in flatMap now and
deprecate the existing method. However this won't compile as there would be two
methods with the same name and signature after erasure. Even if we drop the
TraversableOnce version it won't work for the same reason.
There's a scala-collections-compat library that attempts to bridge some of the
difference between 2.12 and 2.13. It does provide some help with IterableOnce,
but, the compat class is in a different package (scala.collection.compat) than
the final one, and is in any event just a type def for TraversableOnce. It
doesn't seem to help.
I considered adding a dummy implementation of IterableOnce to our source,
extending TraversableOnce. However this too won't help without defining
implicit conversion from Iterable and Iterator to IterableOnce that users would
have to import.
We could instead change the one flatMap method to accept an Iterator, or an
Iterable. Either one makes some usages of flatMap stop working. Of the two,
Iterator is probably the better choice. It's less restrictive on the caller,
it's how the Java equivalent works now, and is more consistent with what
TraversableOnce means now. That would mean you can't flatMap to a collection,
which is unfortuante; you'd have to add ".iterator".
Another option is to of course maintain separate source trees for 2.12 and 2.13
in the future. That's somewhat painful if it means maintaining two versions of
PairRDDFunctions, RDD, DStream, etc. We may be able to break out just the part
that varies into a separate class though.
I'm interested in thoughts on whether it's better to go for separate source
trees to minimize change needed from callers, or, whether requiring an Iterator
is acceptable enough as a breaking change in 3.0. But if we're going to do that
it has to be for 3.0, and unfortunately I don't see a way to keep the existing
method as deprecated while adding the new one.
> Remove usage of TraversableOnce
> -------------------------------
>
> Key: SPARK-27683
> URL: https://issues.apache.org/jira/browse/SPARK-27683
> Project: Spark
> Issue Type: Sub-task
> Components: ML, Spark Core, SQL, Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Sean Owen
> Assignee: Sean Owen
> Priority: Major
>
> As with {{Traversable}}, {{TraversableOnce}} is going away in Scala 2.13. We
> should use {{IterableOnce}} instead. This one is a bigger change as there are
> more API methods with the existing signature.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]