[DISCUSS] Spark Connect Client for Scala 3 as Separate Project

Tanveer Zia Thu, 13 Nov 2025 02:18:53 -0800

Hi Spark community,

I'd like to propose creating a dedicated Apache Spark Connect client for
Scala 3, maintained as a separate project from the core Spark repository.
This initiative aims to provide first-class Scala 3 support while
maintaining full compatibility with the Spark Connect protocol.
Motivation


The Scala ecosystem is rapidly adopting Scala 3, with many organizations
and libraries making the transition. Currently, Spark 4.x's support for
Scala is limited to 2.12 and 2.13, and adding Scala 3 support to the main
repository presents several challenges:


   1. *Cross-compilation complexity*: Supporting Scala 2.12, 2.13, and 3.x
   simultaneously significantly increases build complexity and maintenance
   burden
   2. *Language feature utilization*: Scala 3's new features (contextual
   abstractions, union types, improved metaprogramming) cannot be fully
   leveraged in cross-compiled code
   3. *Dependency management*: Different Scala versions often require
   different dependency versions, complicating the build
   4. *Development velocity*: Changes require extensive testing across all
   Scala versions

Why Spark Connect?

Spark Connect's decoupled architecture makes it an ideal candidate for this
approach:

   - *Protocol-based*: Communication via gRPC means the client and server
   can use different Scala versions
   - *Reduced surface area*: Connect client is focused in scope compared to
   full Spark
   - *Clear compatibility target*: Protocol specification provides clear
   compatibility requirements
   - *Growing adoption*: Spark Connect is becoming the recommended way to
   build Spark applications

Proposed Approach

   1. *Separate Repository*: Create a new repository (e.g.,
   apache/spark-connect-scala3 or within existing Spark org structure)


   1. *Independent Release Cycle*: Version aligned with Spark Connect
   protocol versions rather than Spark releases
   - Example: Client 1.0.x supports Connect Protocol 4.0
      - Example: Client 1.1.x supports Connect Protocol 4.1


   1. *Governance*:
   - Maintain under Apache Spark project governance
      - Start with dedicated maintainers interested in Scala 3
      - Regular sync with Spark Connect core team


   1. *Scope*:
   - Full DataFrame and DataSet API support
      - SQL interface
      - UDF support with Scala 3 features
      - Streaming capabilities
      - Focus on client-side only (no server/cluster changes)

Benefits

*For Users:*

   - Native Scala 3 development experience
   - Access to modern Scala ecosystem and tooling
   - Improved compile times and IDE support
   - Gradual migration path from Scala 2.x applications

*For Spark Project:*

   - Expanded reach into Scala 3 community
   - Reduced complexity in core repository
   - Testing ground for new client-side features
   - Community-driven development reducing core team burden

Potential Concerns and Mitigation

*Fragmentation:* We'll ensure strict protocol compatibility and extensive
testing against Spark releases. The API will remain familiar to existing
Spark users.

*Maintenance:* By engaging the Scala 3 community early and establishing
clear contribution guidelines, we can build a sustainable maintenance model.

*Duplication:* While some code structure will be similar, the
implementation can leverage Scala 3 features for cleaner, more maintainable
code.
Next Steps

If there's interest, I propose:


   1. Gathering feedback on this approach.
   2. Creating a detailed SPIP (Spark Improvement Proposal) if consensus is
   positive
   3. Setting up initial project structure with interested contributors
   4. Developing a proof-of-concept implementation

Looking forward to your thoughts and feedback. I believe this approach
balances the needs of the Scala 3 community with the stability requirements
of the core Spark project.

[DISCUSS] Spark Connect Client for Scala 3 as Separate Project

Reply via email to