Hi all,

I wanted to offer a slightly different perspective regarding the project's
long-term health.
I see a compelling argument for prioritizing efforts that address *codebase
simplification* before investing heavily in a major language upgrade,
especially given the Spark Connect option for users and developers.

My main point centers on the value proposition of this significant change:

   1.

   *Spark Connect as an Alternative:* For many users, the primary benefits
   of a major language upgrade—such as access to new features and APIs—are now
   substantially covered by *Spark Connect*. This feature already provides
   a powerful, similar experience across many use cases, which could suggest
   that the urgency for a full internal transition is not that big.
   2.

   *Impact on Long-Term Maintainability:* My primary concern is the
   cumulative impact of these changes on the project’s technical debt. As the
   codebase currently stands, there are existing complexities (e.g., the
   parallel support for Datasource V1 and V2, the mix of Java and Scala APIs -
   and until not long ago - the support of multiple Scala versions) that
   already challenge *readability and maintenance*.
   3.

   *Risk of Further Fragmentation:* Layering on support for a new major
   language version (Scala 3), which necessarily has differences from previous
   versions, risks further complicating the build matrix and internal logic
   and project structure. I worry this could make the project even more
   challenging to onboard new contributors and manage future patches.

I propose we launch a focused initiative to *tighten and consolidate* the
existing codebase. This would involve:

   -

   *API Simplification:* Creating a roadmap for the eventual deprecation
   and removal of older systems like Datasource V1.
   -

   *Consolidation:* Reducing the remaining areas of language or version
   fragmentation to make the existing code more straightforward.
   -

   *Project high level design doc: *a few pages doc or a video, that
   explains the general flow and some of the most important classes, for new
   contributors to have a starting point.

By investing in internal cleanup and simplification first, we ensure that
any *future* feature or bug fix will be significantly less disruptive and
more cost-effective, while new Languages support will be handled in a
different repo, based on Spark Connect - so it won’t impact the core
project.
Any thoughts about that?


Best regards,
Nimrod


On Wed, Nov 5, 2025 at 9:55 AM Norbert Schultz <
[email protected]> wrote:

> Hi Tanveer,
>
> The approach with Spark Connect from Dangjoon Hyun seems like a good
> start, if we want to run Scala 3 Applications with a Spark backend
>
> However I would also like to see a Scala 3 Build of Spark itself, as it
> would migrating existing applications easier.
>
> For that, it’s maybe a good Idea to just start with a small fork to gather
> more information:
>
> - Update https://github.com/apache/spark/pull/50474
> - There doesn’t seem to be too much Scala Macros in the Codebase. Also
> there is no Shapeless. Good.
> - UDFs, DataSet, Encoders, ScalaReflection etc. are using Typetag to
> encode Decoders. This should be exchanged into some Spark-owned Typeclass,
> which can then describe Scala 2/Scala 3 specific ways. The Scala 2 Code can
> then still rely on TypeTags
> - Enabling Scala 3.3.x on the code and see what breaks. At least Scala
> with SBT supports Scala-Version specific Code paths (e.g. src/main/scala-3,
> Scala-2). I am sure, Maven can do this too. Scala-2-Specific Code goes to
> scala-2. Stubs should make it possible, to compile in Scala-3.
> - Implementing the stubs for Scala 3 and see how it goes. Typetags should
> possible be replaceable by a combination of ClassTag and Mirror.ProductOf
> (guessing)
>
> This could also be possible in a sub-project-wise fashion.
>
> The Scala 3 Code style should be as similar as the existing Scala 2 Style,
> in order to not make it more complicated, so Brace-Style and no unnecessary
> new futures.
>
> Note: I am not deep in the Spark source code.
>
> Kind Regards,
> Norbert
>
>
>
> Am 04.11.2025 um 12:10 schrieb Tanveer Zia <[email protected]>:
>
> Hi everyone,
>
> I’m Tanveer from Scala Teams. We’re interested in contributing to the
> Scala 3 migration of Apache Spark, as referenced in SPARK-54150
> <https://issues.apache.org/jira/browse/SPARK-54150>.
>
> Could you please share the current status or any existing roadmap for this
> migration? We’d also appreciate guidance on how external contributors can
> best get involved or coordinate with the core team on next steps.
>
> Best regards,
> *Tanveer Zia*
> Scala Teams
>
>
>
> Reactive Core GmbH | Paul-Lincke-Ufer 8b | 10999 Berlin
> Fon: +49 30 9832 4666 | Web: www.reactivecore.de
> Handelsregister: Amtsgericht Charlottenburg HRB 156696 B
> Sitz: Berlin | Geschäftsführer: Norbert Schultz
>
>

Reply via email to