Hi everyone, We would appreciate if you could have a look at the proposal presented by Tanveer and share your thoughts.
Regards Danyaal Malik > On 13-Nov-2025, at 3:17 PM, Tanveer Zia <[email protected]> wrote: > > Hi Spark community, > > I'd like to propose creating a dedicated Apache Spark Connect client for > Scala 3, maintained as a separate project from the core Spark repository. > This initiative aims to provide first-class Scala 3 support while maintaining > full compatibility with the Spark Connect protocol. > > Motivation > > The Scala ecosystem is rapidly adopting Scala 3, with many organizations and > libraries making the transition. Currently, Spark 4.x's support for Scala is > limited to 2.12 and 2.13, and adding Scala 3 support to the main repository > presents several challenges: > > > Cross-compilation complexity: Supporting Scala 2.12, 2.13, and 3.x > simultaneously significantly increases build complexity and maintenance burden > Language feature utilization: Scala 3's new features (contextual > abstractions, union types, improved metaprogramming) cannot be fully > leveraged in cross-compiled code > Dependency management: Different Scala versions often require different > dependency versions, complicating the build > Development velocity: Changes require extensive testing across all Scala > versions > Why Spark Connect? > > Spark Connect's decoupled architecture makes it an ideal candidate for this > approach: > > Protocol-based: Communication via gRPC means the client and server can use > different Scala versions > Reduced surface area: Connect client is focused in scope compared to full > Spark > Clear compatibility target: Protocol specification provides clear > compatibility requirements > Growing adoption: Spark Connect is becoming the recommended way to build > Spark applications > Proposed Approach > > Separate Repository: Create a new repository (e.g., > apache/spark-connect-scala3 or within existing Spark org structure) > Independent Release Cycle: Version aligned with Spark Connect protocol > versions rather than Spark releases > Example: Client 1.0.x supports Connect Protocol 4.0 > Example: Client 1.1.x supports Connect Protocol 4.1 > Governance: > Maintain under Apache Spark project governance > Start with dedicated maintainers interested in Scala 3 > Regular sync with Spark Connect core team > Scope: > Full DataFrame and DataSet API support > SQL interface > UDF support with Scala 3 features > Streaming capabilities > Focus on client-side only (no server/cluster changes) > Benefits > > For Users: > > Native Scala 3 development experience > Access to modern Scala ecosystem and tooling > Improved compile times and IDE support > Gradual migration path from Scala 2.x applications > For Spark Project: > > Expanded reach into Scala 3 community > Reduced complexity in core repository > Testing ground for new client-side features > Community-driven development reducing core team burden > Potential Concerns and Mitigation > > Fragmentation: We'll ensure strict protocol compatibility and extensive > testing against Spark releases. The API will remain familiar to existing > Spark users. > > Maintenance: By engaging the Scala 3 community early and establishing clear > contribution guidelines, we can build a sustainable maintenance model. > > Duplication: While some code structure will be similar, the implementation > can leverage Scala 3 features for cleaner, more maintainable code. > > Next Steps > > If there's interest, I propose: > > > Gathering feedback on this approach. > Creating a detailed SPIP (Spark Improvement Proposal) if consensus is positive > Setting up initial project structure with interested contributors > Developing a proof-of-concept implementation > Looking forward to your thoughts and feedback. I believe this approach > balances the needs of the Scala 3 community with the stability requirements > of the core Spark project.
