Re: Inquiry about the Scala 3 migration roadmap for Apache Spark

Dongjoon Hyun Wed, 12 Nov 2025 07:50:47 -0800

Hi, Danyaal.

- For Scala 2, Apache Spark main repository has been providing it for a long 
time as we know and will be there for the rest of Apache Spark 4.x life.
- For "Spark Connect Client for Scala 3", Apache Spark community may want to 
maintain "https://github.com/apache/spark-connect-scala3"; repository for Scala 
3 users.


For your question, let me illustrate an example of Apache Spark K8s Operator 
repository.

1. 2023-11-09 Discussion starts with asking a new repository
https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz
[DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2. 2023-11-13 One of Apache Spark PMC volunteered as a shepherd
https://lists.apache.org/thread/1xl64k2bjtqj153168zd4trwlst7xkml

3. 2023-11-14 Umbrella JIRA issue was created for tracking
SPARK-45923 Spark Kubernetes Operator

4. 2023-11-14 A module vote started
https://lists.apache.org/thread/kvcf74n603nspg4j7whhst06zdmk66th
[VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

5. 2023-11-17 Vote passed with 19 +1s (11 binding +1s)
https://lists.apache.org/thread/92v6mgyk9kvzcfygf52rdkht6tj2c17p
[VOTE][RESULT] SPIP: An Official Kubernetes Operator for Apache Spark

6. 2024-03-29 Repository is created.
https://github.com/apache/spark-kubernetes-operator

7. 2024-04-02 The working PR was provided and reviewed.
https://github.com/apache/spark-kubernetes-operator/pull/2

8. Apache Spark PMC decided to use it as a reference to review responsibly.
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2033278195

> A 19k diff patch is unfortunately not going to be easy to review,
> or result in good quality reviews. Can you please split this up
> into more manageable chunks to aid with it ?
> We can keep this PR as reference to help drive the actual PR reviews

9. Apache Spark PMC helped the voluntary contributors' activities
   and shared the progress.
- 2024-04-22 [1]
- 2024-04-26 [2]
- 2024-05-03 [3]
- 2024-05-20 [4]
- 2024-07-28 [5]
- 2024-08-06 [6]

10. 2024-08-28 SPARK-45923 was resolved.

11. 2024-08-08 Updated Apache Spark website
https://github.com/apache/spark-website/pull/542
Add a curated list for repositories governed by Apache Spark PMC

12. This was a huge activity for the community.
As a result, the community organized the related domains into three seamlessly
as parts of Apache Spark
- SPARK-45923 Spark K8s Operator (New Repository: 132 tasks)
- SPARK-45869 Revisit Spark Standalone Cluster (Supporting SPIP: 110 tasks)
- SPARK-49524 Improve K8s support (Existing spark repository area: 70 tasks)

13. You can see the subproject website here. The latest version is 0.6.0 which 
was released last week.
https://apache.github.io/spark-kubernetes-operator/

During this long journey, Apache Spark community achieved to build a community 
in an organic and responsible way.

Sincerely,
Dongjoon Hyun.

[1] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2071471093
[2] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2080135136
[3] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2093722066
[4] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2120918277
[5] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2254902900
[6] 
https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2272129504


On 2025/11/12 12:09:29 danyaal malik wrote:
> Hi all,
> 
> This is Danyaal Malik from Scala teams. Me and my team are interested in 
> creating a Scala client and maintaining it in the long term as other language 
> clients are being maintained. My query was that how do we submit our project 
> under Apache foundation? I went through this 
> https://spark.apache.org/contributing.html link but it’s more related to 
> contributions in the Spark project itself. 
> 
> Regards
> Danyaal Malik
> Scala Teams
> 
> > On 05-Nov-2025, at 7:31 PM, Nimrod Ofek <[email protected]> wrote:
> > 
> > Hi all,
> > 
> > I wanted to offer a slightly different perspective regarding the project's 
> > long-term health. 
> > I see a compelling argument for prioritizing efforts that address codebase 
> > simplification before investing heavily in a major language upgrade, 
> > especially given the Spark Connect option for users and developers.
> > 
> > My main point centers on the value proposition of this significant change:
> > 
> > Spark Connect as an Alternative: For many users, the primary benefits of a 
> > major language upgrade—such as access to new features and APIs—are now 
> > substantially covered by Spark Connect. This feature already provides a 
> > powerful, similar experience across many use cases, which could suggest 
> > that the urgency for a full internal transition is not that big.
> > 
> > Impact on Long-Term Maintainability: My primary concern is the cumulative 
> > impact of these changes on the project’s technical debt. As the codebase 
> > currently stands, there are existing complexities (e.g., the parallel 
> > support for Datasource V1 and V2, the mix of Java and Scala APIs - and 
> > until not long ago - the support of multiple Scala versions) that already 
> > challenge readability and maintenance.
> > 
> > Risk of Further Fragmentation: Layering on support for a new major language 
> > version (Scala 3), which necessarily has differences from previous 
> > versions, risks further complicating the build matrix and internal logic 
> > and project structure. I worry this could make the project even more 
> > challenging to onboard new contributors and manage future patches.
> > 
> > 
> > I propose we launch a focused initiative to tighten and consolidate the 
> > existing codebase. This would involve:
> > 
> > API Simplification: Creating a roadmap for the eventual deprecation and 
> > removal of older systems like Datasource V1.
> > 
> > Consolidation: Reducing the remaining areas of language or version 
> > fragmentation to make the existing code more straightforward.
> > 
> > Project high level design doc: a few pages doc or a video, that explains 
> > the general flow and some of the most important classes, for new 
> > contributors to have a starting point. 
> > 
> > By investing in internal cleanup and simplification first, we ensure that 
> > any future feature or bug fix will be significantly less disruptive and 
> > more cost-effective, while new Languages support will be handled in a 
> > different repo, based on Spark Connect - so it won’t impact the core 
> > project.
> > Any thoughts about that?
> > 
> > 
> > 
> > Best regards, 
> > Nimrod
> > 
> > 
> > 
> > On Wed, Nov 5, 2025 at 9:55 AM Norbert Schultz 
> > <[email protected] <mailto:[email protected]>> 
> > wrote:
> >> Hi Tanveer,
> >> 
> >> The approach with Spark Connect from Dangjoon Hyun seems like a good 
> >> start, if we want to run Scala 3 Applications with a Spark backend
> >> 
> >> However I would also like to see a Scala 3 Build of Spark itself, as it 
> >> would migrating existing applications easier.
> >> 
> >> For that, it’s maybe a good Idea to just start with a small fork to gather 
> >> more information:
> >> 
> >> - Update https://github.com/apache/spark/pull/50474  
> >> - There doesn’t seem to be too much Scala Macros in the Codebase. Also 
> >> there is no Shapeless. Good.
> >> - UDFs, DataSet, Encoders, ScalaReflection etc. are using Typetag to 
> >> encode Decoders. This should be exchanged into some Spark-owned Typeclass, 
> >> which can then describe Scala 2/Scala 3 specific ways. The Scala 2 Code 
> >> can then still rely on TypeTags
> >> - Enabling Scala 3.3.x on the code and see what breaks. At least Scala 
> >> with SBT supports Scala-Version specific Code paths (e.g. 
> >> src/main/scala-3, Scala-2). I am sure, Maven can do this too. 
> >> Scala-2-Specific Code goes to scala-2. Stubs should make it possible, to 
> >> compile in Scala-3.
> >> - Implementing the stubs for Scala 3 and see how it goes. Typetags should 
> >> possible be replaceable by a combination of ClassTag and Mirror.ProductOf 
> >> (guessing)
> >> 
> >> This could also be possible in a sub-project-wise fashion.
> >> 
> >> The Scala 3 Code style should be as similar as the existing Scala 2 Style, 
> >> in order to not make it more complicated, so Brace-Style and no 
> >> unnecessary new futures.
> >> 
> >> Note: I am not deep in the Spark source code.
> >> 
> >> Kind Regards,
> >> Norbert
> >> 
> >> 
> >> 
> >>> Am 04.11.2025 um 12:10 schrieb Tanveer Zia <[email protected] 
> >>> <mailto:[email protected]>>:
> >>> 
> >>> Hi everyone,
> >>> 
> >>> I’m Tanveer from Scala Teams. We’re interested in contributing to the 
> >>> Scala 3 migration of Apache Spark, as referenced in SPARK-54150 
> >>> <https://issues.apache.org/jira/browse/SPARK-54150>.
> >>> 
> >>> Could you please share the current status or any existing roadmap for 
> >>> this migration? We’d also appreciate guidance on how external 
> >>> contributors can best get involved or coordinate with the core team on 
> >>> next steps.
> >>> 
> >>> Best regards,
> >>> Tanveer Zia
> >>> Scala Teams
> >>> 
> >> 
> >> 
> >> Reactive Core GmbH | Paul-Lincke-Ufer 8b | 10999 Berlin
> >> Fon: +49 30 9832 4666 | Web: www.reactivecore.de 
> >> <http://www.reactivecore.de/>
> >> Handelsregister: Amtsgericht Charlottenburg HRB 156696 B
> >> Sitz: Berlin | Geschäftsführer: Norbert Schultz
> >> 
> 
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Inquiry about the Scala 3 migration roadmap for Apache Spark

Reply via email to