Hi, Danyaal. - For Scala 2, Apache Spark main repository has been providing it for a long time as we know and will be there for the rest of Apache Spark 4.x life. - For "Spark Connect Client for Scala 3", Apache Spark community may want to maintain "https://github.com/apache/spark-connect-scala3" repository for Scala 3 users.
For your question, let me illustrate an example of Apache Spark K8s Operator repository. 1. 2023-11-09 Discussion starts with asking a new repository https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark 2. 2023-11-13 One of Apache Spark PMC volunteered as a shepherd https://lists.apache.org/thread/1xl64k2bjtqj153168zd4trwlst7xkml 3. 2023-11-14 Umbrella JIRA issue was created for tracking SPARK-45923 Spark Kubernetes Operator 4. 2023-11-14 A module vote started https://lists.apache.org/thread/kvcf74n603nspg4j7whhst06zdmk66th [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark 5. 2023-11-17 Vote passed with 19 +1s (11 binding +1s) https://lists.apache.org/thread/92v6mgyk9kvzcfygf52rdkht6tj2c17p [VOTE][RESULT] SPIP: An Official Kubernetes Operator for Apache Spark 6. 2024-03-29 Repository is created. https://github.com/apache/spark-kubernetes-operator 7. 2024-04-02 The working PR was provided and reviewed. https://github.com/apache/spark-kubernetes-operator/pull/2 8. Apache Spark PMC decided to use it as a reference to review responsibly. https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2033278195 > A 19k diff patch is unfortunately not going to be easy to review, > or result in good quality reviews. Can you please split this up > into more manageable chunks to aid with it ? > We can keep this PR as reference to help drive the actual PR reviews 9. Apache Spark PMC helped the voluntary contributors' activities and shared the progress. - 2024-04-22 [1] - 2024-04-26 [2] - 2024-05-03 [3] - 2024-05-20 [4] - 2024-07-28 [5] - 2024-08-06 [6] 10. 2024-08-28 SPARK-45923 was resolved. 11. 2024-08-08 Updated Apache Spark website https://github.com/apache/spark-website/pull/542 Add a curated list for repositories governed by Apache Spark PMC 12. This was a huge activity for the community. As a result, the community organized the related domains into three seamlessly as parts of Apache Spark - SPARK-45923 Spark K8s Operator (New Repository: 132 tasks) - SPARK-45869 Revisit Spark Standalone Cluster (Supporting SPIP: 110 tasks) - SPARK-49524 Improve K8s support (Existing spark repository area: 70 tasks) 13. You can see the subproject website here. The latest version is 0.6.0 which was released last week. https://apache.github.io/spark-kubernetes-operator/ During this long journey, Apache Spark community achieved to build a community in an organic and responsible way. Sincerely, Dongjoon Hyun. [1] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2071471093 [2] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2080135136 [3] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2093722066 [4] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2120918277 [5] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2254902900 [6] https://github.com/apache/spark-kubernetes-operator/pull/2#issuecomment-2272129504 On 2025/11/12 12:09:29 danyaal malik wrote: > Hi all, > > This is Danyaal Malik from Scala teams. Me and my team are interested in > creating a Scala client and maintaining it in the long term as other language > clients are being maintained. My query was that how do we submit our project > under Apache foundation? I went through this > https://spark.apache.org/contributing.html link but it’s more related to > contributions in the Spark project itself. > > Regards > Danyaal Malik > Scala Teams > > > On 05-Nov-2025, at 7:31 PM, Nimrod Ofek <[email protected]> wrote: > > > > Hi all, > > > > I wanted to offer a slightly different perspective regarding the project's > > long-term health. > > I see a compelling argument for prioritizing efforts that address codebase > > simplification before investing heavily in a major language upgrade, > > especially given the Spark Connect option for users and developers. > > > > My main point centers on the value proposition of this significant change: > > > > Spark Connect as an Alternative: For many users, the primary benefits of a > > major language upgrade—such as access to new features and APIs—are now > > substantially covered by Spark Connect. This feature already provides a > > powerful, similar experience across many use cases, which could suggest > > that the urgency for a full internal transition is not that big. > > > > Impact on Long-Term Maintainability: My primary concern is the cumulative > > impact of these changes on the project’s technical debt. As the codebase > > currently stands, there are existing complexities (e.g., the parallel > > support for Datasource V1 and V2, the mix of Java and Scala APIs - and > > until not long ago - the support of multiple Scala versions) that already > > challenge readability and maintenance. > > > > Risk of Further Fragmentation: Layering on support for a new major language > > version (Scala 3), which necessarily has differences from previous > > versions, risks further complicating the build matrix and internal logic > > and project structure. I worry this could make the project even more > > challenging to onboard new contributors and manage future patches. > > > > > > I propose we launch a focused initiative to tighten and consolidate the > > existing codebase. This would involve: > > > > API Simplification: Creating a roadmap for the eventual deprecation and > > removal of older systems like Datasource V1. > > > > Consolidation: Reducing the remaining areas of language or version > > fragmentation to make the existing code more straightforward. > > > > Project high level design doc: a few pages doc or a video, that explains > > the general flow and some of the most important classes, for new > > contributors to have a starting point. > > > > By investing in internal cleanup and simplification first, we ensure that > > any future feature or bug fix will be significantly less disruptive and > > more cost-effective, while new Languages support will be handled in a > > different repo, based on Spark Connect - so it won’t impact the core > > project. > > Any thoughts about that? > > > > > > > > Best regards, > > Nimrod > > > > > > > > On Wed, Nov 5, 2025 at 9:55 AM Norbert Schultz > > <[email protected] <mailto:[email protected]>> > > wrote: > >> Hi Tanveer, > >> > >> The approach with Spark Connect from Dangjoon Hyun seems like a good > >> start, if we want to run Scala 3 Applications with a Spark backend > >> > >> However I would also like to see a Scala 3 Build of Spark itself, as it > >> would migrating existing applications easier. > >> > >> For that, it’s maybe a good Idea to just start with a small fork to gather > >> more information: > >> > >> - Update https://github.com/apache/spark/pull/50474 > >> - There doesn’t seem to be too much Scala Macros in the Codebase. Also > >> there is no Shapeless. Good. > >> - UDFs, DataSet, Encoders, ScalaReflection etc. are using Typetag to > >> encode Decoders. This should be exchanged into some Spark-owned Typeclass, > >> which can then describe Scala 2/Scala 3 specific ways. The Scala 2 Code > >> can then still rely on TypeTags > >> - Enabling Scala 3.3.x on the code and see what breaks. At least Scala > >> with SBT supports Scala-Version specific Code paths (e.g. > >> src/main/scala-3, Scala-2). I am sure, Maven can do this too. > >> Scala-2-Specific Code goes to scala-2. Stubs should make it possible, to > >> compile in Scala-3. > >> - Implementing the stubs for Scala 3 and see how it goes. Typetags should > >> possible be replaceable by a combination of ClassTag and Mirror.ProductOf > >> (guessing) > >> > >> This could also be possible in a sub-project-wise fashion. > >> > >> The Scala 3 Code style should be as similar as the existing Scala 2 Style, > >> in order to not make it more complicated, so Brace-Style and no > >> unnecessary new futures. > >> > >> Note: I am not deep in the Spark source code. > >> > >> Kind Regards, > >> Norbert > >> > >> > >> > >>> Am 04.11.2025 um 12:10 schrieb Tanveer Zia <[email protected] > >>> <mailto:[email protected]>>: > >>> > >>> Hi everyone, > >>> > >>> I’m Tanveer from Scala Teams. We’re interested in contributing to the > >>> Scala 3 migration of Apache Spark, as referenced in SPARK-54150 > >>> <https://issues.apache.org/jira/browse/SPARK-54150>. > >>> > >>> Could you please share the current status or any existing roadmap for > >>> this migration? We’d also appreciate guidance on how external > >>> contributors can best get involved or coordinate with the core team on > >>> next steps. > >>> > >>> Best regards, > >>> Tanveer Zia > >>> Scala Teams > >>> > >> > >> > >> Reactive Core GmbH | Paul-Lincke-Ufer 8b | 10999 Berlin > >> Fon: +49 30 9832 4666 | Web: www.reactivecore.de > >> <http://www.reactivecore.de/> > >> Handelsregister: Amtsgericht Charlottenburg HRB 156696 B > >> Sitz: Berlin | Geschäftsführer: Norbert Schultz > >> > > --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
