+1 (non-binding) A way forward for Apache Spark, allowing developers to choose either option, offering community to share critical feedback for Spark Connect, and paving a path for Spark to be accessible from everywhere, from other non-jvm based languages.
Cheers Jules — Sent from my iPhone Pardon the dumb thumb typos :) > On Feb 3, 2025, at 11:30 PM, Wenchen Fan <cloud0...@gmail.com> wrote: > > > Hi all, > > There is partial agreement and consensus that Spark Connect is crucial for > the future stability of Spark APIs for both end users and developers. At the > same time, a couple of PMC members raised concerns about making Spark Connect > the default in the upcoming Spark 4.0 release. I’m proposing an alternative > approach here: publish an additional Spark distribution with Spark Connect > enabled by default. This approach will help promote the adoption of Spark > Connect among new users while allowing us to gather valuable feedback. A > separate distribution with Spark Connect enabled by default can promote > future adoption of Spark Connect for languages like Rust, Go, or Scala 3. > > Here are the details of the proposal: > > Spark 4.0 will include three PyPI packages: > pyspark: The classic package. > pyspark-client: The thin Spark Connect Python client. Note, in the Spark 4.0 > preview releases, we have published the pyspark-connect package for the thin > client, we will need to rename it in the official 4.0 release. > pyspark-connect: Spark Connect enabled by default. > An additional tarball will be added to the Spark 4.0 download page with > updated scripts (spark-submit, spark-shell, etc.) to enable Spark Connect by > default. > A new Docker image will be provided with Spark Connect enabled by default. > By taking this approach, we can make Spark Connect more visible and > accessible to users, which is more effective than simply asking them to > configure it manually. > > Looking forward to hearing your thoughts! > > Thanks, > Wenchen >