As you can see, I've been working on Scala 2.13 support. The umbrella
is https://issues.apache.org/jira/browse/SPARK-25075 I wanted to lay
out status and strategy.

This will not be done for 3.0. At the least, there are a few key
dependencies (Chill, Kafka) that aren't published for 2.13, and at
least one change that will need removing an API deprecated as of 3.0.
Realistically: maybe Spark 3.1. I don't yet think it's pressing.


Making the change is difficult as it's hard to understand the extent
of the necessary changes until the whole thing minimally compiles for
2.13. I have gotten essentially that far in a local clone. The good
news is I don't see any obvious hard blockers, but the changes add up
to thousands of line in 200+ files.


What do we need to do for 3.0? any changes that entail breaking a
public API, ideally. The biggest issue there comes from extensive
changes to the Scala collection hierarchy mean that the types of many
public APIs that return a Seq, Map, TraversableOnce, etc _will_
actually change types in 2.13 (become immutable). See:
https://issues.apache.org/jira/browse/SPARK-27683 and
https://issues.apache.org/jira/browse/SPARK-29292 as the main
examples.

In both cases, keeping the exact same public type would require much
bigger changes. These are the type of changes that all applications
face when migrating to 2.13 though. 2.12 and 2.13 apps were never
meant to be binary-compatible. So, in both cases we're not changing
these, to avoid a lot of change and parallel source trees.

I _think_ we're done with any other must-do changes for 3.0, therefore.


What _can_ we do for 3.0? small changes that don't affect the 2.12
build are OK, and that's what you see in pull requests going in at the
moment. The big question is whether we want to do the large change for
https://issues.apache.org/jira/browse/SPARK-29292 before 3.0. It will
mean adding a ton of ".toSeq" and ".toMap" calls to make mutable
collections immutable when passed to methods. In theory, it won't
affect behavior. We'll have to see if it does in practice.

The rest will have to wait until after 3.0, I believe, including even
testing the 2.13 build, which will probably turn up some more issues.


Thoughts on approach?

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to