I agree that a product must be usable first. Pinning the version (to a specific number with `==`) will make pyspark unusable.
First of all, I think we can agree that many users use PySpark with other Python packages. If we conflict with other packages, `pip install -r requirements.txt` won't work. It will complain that the dependencies can't be resolved, which completely breaks our user's workflow. Even if the user locks the dependency version, it won't work. So the user had to install PySpark first, then the other packages, to override PySpark's dependency. They can't put their dependency list in a single file - that is a horrible user experience. When I look at controversial topics, I always have a strong belief, that I can't be the only smart person in the world. If an idea is good, others must already be doing it. Can we find any recognized package in the market that pins its dependencies to a specific version? The only case it works is when this package is *all* the user needs. That's why we pin versions for docker images, HTTP services, or standalone tools - users just need something that works out of the box. If we consider PySpark the dominant package - meaning that if a user employs it, it must be the most important element in their project and everything else must comply with it - pinning versions might be viable. I'm not familiar with Java dependency solutions or how users use spark with Java, but I'm familiar with the Python ecosystem and community. If we pin to a specific version, we will face significant criticism. If we must do it, at least don't make it default. Like I said above, I don't have a strong opinion about having a `pyspark[pinned]` - if users only need pyspark and no other packages they could use that. But that's extra effort for maintenance, and we need to think about what's pinned. We have a lot of pyspark install versions. Tian Gao On Sun, Mar 29, 2026 at 7:12 PM Cheng Pan <[email protected]> wrote: > I think the community has already reached consistence to freeze > dependencies in minor release. > > SPARK-54633 - SPIP: Accelerating Apache Spark Release Cadence [1] > > > Clear rules for changes allowed in minor vs. major releases: > > - Dependencies are frozen and behavioral changes are minimized in minor > releases. > > I would interpret the proposed dependency policy applies to both > Java/Scala and Python dependency management for Spark. If so, that means > PySpark will always use pinned dependencies version since 4.3.0. But if the > intention is to only apply such a dependency policy to Java/Scala, then it > creates a very strange situation - an extremely conservative dependency > management strategy for Java/Scala, and an extremely liberal one for Python. > > To Tian Gao, > > > Pinning versions is a double-edged sword, it doesn't always make us more > secure - that's my major point. > > Product must be usable first, then security, performance, etc. If it > claims require `foo>=2.0.0`, how do you ensure it is compatible with foo > `2.3.4`, `3.x.x`, `4.x.x`? Actually, such incompatible failures occurred > many times, e.g.,[2]. On the contrary, if it claims require `foo==2.0.0`, > that means it was thoroughly tested with `foo==2.0.0`, and users take their > own risk to use it with other `foo` versions, for exmaple, if the `foo` > strictly follow semantic version, it should work with `foo<3.0.0`, but this > is not Spark's responsibility, users should assess and assume the risk of > incompatibility themselves. > > [1] https://issues.apache.org/jira/browse/SPARK-54633 > [2] https://github.com/apache/spark/pull/52633 > > Thanks, > Cheng Pan > > > > On Mar 28, 2026, at 06:59, Holden Karau <[email protected]> wrote: > > Response inline > > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Fri, Mar 27, 2026 at 1:01 PM Nicholas Chammas < > [email protected]> wrote: > >> >> On Mar 27, 2026, at 12:31 PM, Holden Karau <[email protected]> >> wrote: >> >> One possibility would be to make the pinned version optional (eg >> pyspark[pinned]) or publish a separate constraints file for people to >> optionally use with -c? >> >> >> Perhaps I am misunderstanding your proposal, Holden, but this is possible >> today for people using modern Python packaging workflows that use lock >> files. In fact, it happens automatically; all transitive dependencies are >> pinned in the lock file, and this is by design. >> > So for someone installing a fresh venv with uv/pip/or conda where does > this come from? > > The idea here is we provide the versions we used during the release stage > so if folks want a “known safe” initial starting point for a new env > they’ve got one. > >> >> Furthermore, it is straightforward to add additional restrictions to your >> project spec (i.e. pyproject.toml) so that when the packaging tool builds >> the lock file, it does it with whatever restrictions you want that are >> specific to your project. That could include specific versions or version >> ranges of libraries to exclude, for example. >> > Yes, but as it stands we leave it to the end user to start from scratch > picking these versions, we can make their lives simpler by providing the > versions we tested against with a lock file they can choose to use, ignore, > or update to their desired versions and include. > > Also for interactive workloads I more often see a bare requirements file > or even pip installs in nb cells (but this could be sample bias). > >> >> I had to do this, for example, on a personal project that used PySpark >> Connect but which was pulling in a version of grpc that was generating a >> lot of log noise >> <https://github.com/grpc/grpc/issues/38336#issuecomment-2588422915>. I >> pinned the version of grpc in my project file and let the packaging tool >> resolve all the requirements across PySpark Connect and my custom >> restrictions. >> >> Nick >> >> >
