+1 (non-binding) 

A way forward for Apache Spark, allowing developers to choose either option, 
offering community to share critical feedback for Spark Connect, and paving a 
path for Spark to be accessible from  everywhere, from other non-jvm based 
languages. 

Cheers
Jules 
—
Sent from my iPhone
Pardon the dumb thumb typos :)

> On Feb 3, 2025, at 11:30 PM, Wenchen Fan <cloud0...@gmail.com> wrote:
> 
> 
> Hi all,
> 
> There is partial agreement and consensus that Spark Connect is crucial for 
> the future stability of Spark APIs for both end users and developers. At the 
> same time, a couple of PMC members raised concerns about making Spark Connect 
> the default in the upcoming Spark 4.0 release. I’m proposing an alternative 
> approach here: publish an additional Spark distribution with Spark Connect 
> enabled by default. This approach will help promote the adoption of Spark 
> Connect among new users while allowing us to gather valuable feedback. A 
> separate distribution with Spark Connect enabled by default can promote 
> future adoption of Spark Connect for languages like Rust, Go, or Scala 3.
> 
> Here are the details of the proposal:
> 
> Spark 4.0 will include three PyPI packages:
> pyspark: The classic package.
> pyspark-client: The thin Spark Connect Python client. Note, in the Spark 4.0 
> preview releases, we have published the pyspark-connect package for the thin 
> client, we will need to rename it in the official 4.0 release.
> pyspark-connect: Spark Connect enabled by default.
> An additional tarball will be added to the Spark 4.0 download page with 
> updated scripts (spark-submit, spark-shell, etc.) to enable Spark Connect by 
> default.
> A new Docker image will be provided with Spark Connect enabled by default.
> By taking this approach, we can make Spark Connect more visible and 
> accessible to users, which is more effective than simply asking them to 
> configure it manually.
> 
> Looking forward to hearing your thoughts!
> 
> Thanks,
> Wenchen
> 

Reply via email to