Hi everyone,

We would appreciate if you could have a look at the proposal presented by 
Tanveer and share your thoughts.

Regards
Danyaal Malik

> On 13-Nov-2025, at 3:17 PM, Tanveer Zia <[email protected]> wrote:
> 
> Hi Spark community,
> 
> I'd like to propose creating a dedicated Apache Spark Connect client for 
> Scala 3, maintained as a separate project from the core Spark repository. 
> This initiative aims to provide first-class Scala 3 support while maintaining 
> full compatibility with the Spark Connect protocol.
> 
> Motivation
> 
> The Scala ecosystem is rapidly adopting Scala 3, with many organizations and 
> libraries making the transition. Currently, Spark 4.x's support for Scala is 
> limited to 2.12 and 2.13, and adding Scala 3 support to the main repository 
> presents several challenges:
> 
> 
> Cross-compilation complexity: Supporting Scala 2.12, 2.13, and 3.x 
> simultaneously significantly increases build complexity and maintenance burden
> Language feature utilization: Scala 3's new features (contextual 
> abstractions, union types, improved metaprogramming) cannot be fully 
> leveraged in cross-compiled code
> Dependency management: Different Scala versions often require different 
> dependency versions, complicating the build
> Development velocity: Changes require extensive testing across all Scala 
> versions
> Why Spark Connect?
> 
> Spark Connect's decoupled architecture makes it an ideal candidate for this 
> approach:
> 
> Protocol-based: Communication via gRPC means the client and server can use 
> different Scala versions
> Reduced surface area: Connect client is focused in scope compared to full 
> Spark
> Clear compatibility target: Protocol specification provides clear 
> compatibility requirements
> Growing adoption: Spark Connect is becoming the recommended way to build 
> Spark applications
> Proposed Approach
> 
> Separate Repository: Create a new repository (e.g., 
> apache/spark-connect-scala3 or within existing Spark org structure)
> Independent Release Cycle: Version aligned with Spark Connect protocol 
> versions rather than Spark releases
> Example: Client 1.0.x supports Connect Protocol 4.0
> Example: Client 1.1.x supports Connect Protocol 4.1
> Governance:
> Maintain under Apache Spark project governance
> Start with dedicated maintainers interested in Scala 3
> Regular sync with Spark Connect core team
> Scope:
> Full DataFrame and DataSet API support
> SQL interface
> UDF support with Scala 3 features
> Streaming capabilities
> Focus on client-side only (no server/cluster changes)
> Benefits
> 
> For Users:
> 
> Native Scala 3 development experience
> Access to modern Scala ecosystem and tooling
> Improved compile times and IDE support
> Gradual migration path from Scala 2.x applications
> For Spark Project:
> 
> Expanded reach into Scala 3 community
> Reduced complexity in core repository
> Testing ground for new client-side features
> Community-driven development reducing core team burden
> Potential Concerns and Mitigation
> 
> Fragmentation: We'll ensure strict protocol compatibility and extensive 
> testing against Spark releases. The API will remain familiar to existing 
> Spark users.
> 
> Maintenance: By engaging the Scala 3 community early and establishing clear 
> contribution guidelines, we can build a sustainable maintenance model.
> 
> Duplication: While some code structure will be similar, the implementation 
> can leverage Scala 3 features for cleaner, more maintainable code.
> 
> Next Steps
> 
> If there's interest, I propose:
> 
> 
> Gathering feedback on this approach.
> Creating a detailed SPIP (Spark Improvement Proposal) if consensus is positive
> Setting up initial project structure with interested contributors
> Developing a proof-of-concept implementation
> Looking forward to your thoughts and feedback. I believe this approach 
> balances the needs of the Scala 3 community with the stability requirements 
> of the core Spark project.  

Reply via email to