Awesome, I started on one by its super rough so I’ll leave it to you Tian :) (filed a JIRA so grab the existing JIRA for coordination)
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Mon, May 18, 2026 at 5:03 PM Tian Gao <[email protected]> wrote: > I can work on a prototype. My thought is that we should keep the > dependency list in `pyproject.toml`. We can have dependency groups for all > different scenarios (test/dev, minimum/lint/docs etc). Then for generating > docker images, we include `pyproject.toml` and pip install based on that. I > believe we can keep the only truth in that file (which is a common way to > do things) and still be flexible. > > On Mon, May 18, 2026 at 4:55 PM Holden Karau <[email protected]> > wrote: > >> Single source of truth does sound desirable, let me take a look at >> narrowing that down a bit too. >> >> On Mon, May 18, 2026 at 4:30 PM Tian Gao via dev <[email protected]> >> wrote: >> >>> We can do either a list of packages from `pip freeze` on our website, or >>> a `pyspark[pinned]` that has `==`. I'm okay with either (or both). >>> >>> If we want to do that, we probably want to pin our package versions on >>> our stable spark versions. We only partially pin our dependencies for our >>> CI for maintenance branches, so we do not even have the list now (we may >>> have it for a certain date, but the list could change any time in the >>> future). >>> >>> I think we should come up with a more official CI system so we always >>> test the released versions (4.0, 4.1 ...) with a pinned versions of >>> packages (which are the "known working dependencies"), and be more relaxed >>> for dev branches (4.x, master) because we need to test against new releases >>> for our dependencies. >>> >>> More importantly, it would be really nice to have a single source of >>> truth. We have to many places to pin the python dependency versions. >>> >>> Tian >>> >>> On Sun, May 17, 2026 at 9:52 AM Holden Karau <[email protected]> >>> wrote: >>> >>>> I am at PyCon USA Today and the PyPi head just did a call out to audit >>>> and pin dependencies because the supply chain attacks are increasing hockey >>>> stick style. >>>> >>>> I think we don’t need to pin just yet but let’s add publishing the >>>> package versions we built with during CI. >>>> >>>> >>>> Twitter: https://twitter.com/holdenkarau >>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> Pronouns: she/her >>>> >>>> On Wed, Apr 1, 2026 at 7:48 AM Devin Petersohn via dev < >>>> [email protected]> wrote: >>>> >>>>> I think we should do something in response to the growing supply chain >>>>> attacks rather than just leaving the problem to users. One alternative we >>>>> could consider for Python specifically is an install target with upper >>>>> bounded dependencies: `pip install "pyspark[deps-upper-bounded]"`. This >>>>> wouldn't impact regular use, and seems like it would solve the other >>>>> problems with publishing lock files, etc. As others have mentioned, this >>>>> wouldn't *guarantee* security, but it would provide meaningful protection >>>>> against the worst offenders we've recently seen. >>>>> >>>>> On Wed, Apr 1, 2026 at 9:37 AM Cheng Pan <[email protected]> wrote: >>>>> >>>>>> > How about as a compromise, we publish (but don’t lock to) the pip >>>>>> freeze outputs of the venvs we use for testing? >>>>>> >>>>>> > Where do you propose to publish? Spark website? Maybe in our github >>>>>> repo somewhere? >>>>>> >>>>>> > I was thinking just in the publisher artifacts directory we already >>>>>> do. >>>>>> >>>>>> +1, I'm fine with any approach, as long as it provides sufficient >>>>>> info to let user know which exactly version of dependencies was used for >>>>>> testing. >>>>>> >>>>>> For Java/Scala, we have a script[1] generated dependency list in code >>>>>> repo, at [2] >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/spark/blob/branch-4.1/dev/test-dependencies.sh >>>>>> [2] >>>>>> https://github.com/apache/spark/blob/branch-4.1/dev/deps/spark-deps-hadoop-3-hive-2.3 >>>>>> >>>>>> Thanks, >>>>>> Cheng Pan >>>>>> >>>>>> >>>>>> >>>>>> On Mar 31, 2026, at 03:12, Holden Karau <[email protected]> >>>>>> wrote: >>>>>> >>>>>> I was thinking just in the publisher artifacts directory we already >>>>>> do. >>>>>> >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> Pronouns: she/her >>>>>> >>>>>> >>>>>> On Mon, Mar 30, 2026 at 10:26 AM Tian Gao <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Where do you propose to publish? Spark website? Maybe in our github >>>>>>> repo somewhere? For python packages, users rarely look for artifacts >>>>>>> (and >>>>>>> it's difficult to find). >>>>>>> >>>>>>> Tian >>>>>>> >>>>>>> On Mon, Mar 30, 2026 at 10:04 AM Holden Karau < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I hear that. How about as a compromise, we publish (but don’t lock >>>>>>>> to) the pip freeze outputs of the venvs we use for testing? >>>>>>>> >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> Pronouns: she/her >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 30, 2026 at 8:04 AM Nicholas Chammas < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I think supply chain attacks are a problem, but I don’t think we >>>>>>>>> want to be on the hook for a solution here, even if it’s meant just >>>>>>>>> for our >>>>>>>>> project. >>>>>>>>> >>>>>>>>> There are “good enough” approaches available today for Python that >>>>>>>>> mitigate most of the risk by excluding recent releases when resolving >>>>>>>>> what >>>>>>>>> package versions to install. >>>>>>>>> >>>>>>>>> uv offers exclude-newer >>>>>>>>> <https://docs.astral.sh/uv/reference/settings/#exclude-newer>. >>>>>>>>> pip offers uploaded-prior-to >>>>>>>>> <https://pip.pypa.io/en/stable/cli/pip_index/#cmdoption-uploaded-prior-to>. >>>>>>>>> Poetry has an issue open >>>>>>>>> <https://github.com/python-poetry/poetry/issues/10646> for a >>>>>>>>> similar feature, plus at least one open PR to close it. >>>>>>>>> >>>>>>>>> Users concerned about supply chain attacks would probably get >>>>>>>>> better results from using these options as compared to installing >>>>>>>>> pinned >>>>>>>>> dependencies provided by the projects they use. >>>>>>>>> >>>>>>>>> Nick >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mar 30, 2026, at 3:31 AM, Holden Karau <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> So I think we can ship it as an optional distribution element >>>>>>>>> (it's literally just another file folks can choose to download/use if >>>>>>>>> they >>>>>>>>> want). >>>>>>>>> >>>>>>>>> Asking users is an idea too, I could put together a survey if we >>>>>>>>> want? >>>>>>>>> >>>>>>>>> On Sun, Mar 29, 2026 at 11:14 PM Tian Gao via dev < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> I believe "foo~=2.0.1" is a syntax sugar for "foo>=2.0.1, >>>>>>>>>> foo==2.0.*". Similarly, "foo>=2.0.0, <3.0.0" is "foo~=2.0". This is >>>>>>>>>> a nit >>>>>>>>>> and we don't need to focus on the syntax. >>>>>>>>>> >>>>>>>>>> I don't believe we can ship pyspark with a env lock file. That's >>>>>>>>>> what users do in their own projects. It's not part of python package >>>>>>>>>> system. What users do is normally install packages, test it out, >>>>>>>>>> then lock >>>>>>>>>> it with either pip or uv - generate a lock file for all dependencies >>>>>>>>>> and >>>>>>>>>> use it across their systems. It's not common for packages to list >>>>>>>>>> out a >>>>>>>>>> "known working dependency list" for users. >>>>>>>>>> >>>>>>>>>> However, if we really want to try it out, we can do something >>>>>>>>>> like `pip install pyspark[full-pinned] and install every dependency >>>>>>>>>> pyspark >>>>>>>>>> requires with a pinned version. If our user needs an out-of-box >>>>>>>>>> solution >>>>>>>>>> they can do that. We can also collect feedbacks and see the >>>>>>>>>> sentiment from >>>>>>>>>> users. >>>>>>>>>> >>>>>>>>>> Tian >>>>>>>>>> >>>>>>>>>> On Sun, Mar 29, 2026 at 10:29 PM Cheng Pan <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> > If we consider PySpark the dominant package - meaning that if >>>>>>>>>>> a user employs it, it must be the most important element in their >>>>>>>>>>> project >>>>>>>>>>> and everything else must comply with it - pinning versions might be >>>>>>>>>>> viable. >>>>>>>>>>> >>>>>>>>>>> This is not always true, but definitely a major case. >>>>>>>>>>> >>>>>>>>>>> > I'm not familiar with Java dependency solutions or how users >>>>>>>>>>> use spark with Java >>>>>>>>>>> >>>>>>>>>>> In Java/Scala, it's rare to use dynamic version for dependency >>>>>>>>>>> management. Product declares transitive dependencies with pinned >>>>>>>>>>> version, >>>>>>>>>>> and the package manager (Maven, SBT, Gradle, etc.) picks the most >>>>>>>>>>> reasonable version based on resolution rules. The rules is a little >>>>>>>>>>> different in Maven, SBT and Gradle, the Maven docs[1] explains how >>>>>>>>>>> it works. >>>>>>>>>>> >>>>>>>>>>> In short, in Java/Scala dependency management, the pinned >>>>>>>>>>> version is more like a suggested version, it's easy to override by >>>>>>>>>>> users. >>>>>>>>>>> >>>>>>>>>>> As Owen pointed out, things are completely different in Python >>>>>>>>>>> world, both pinned version and latest version seems not ideal, then >>>>>>>>>>> >>>>>>>>>>> 1. pinned version (foo==2.0.0) >>>>>>>>>>> 2. allow maintenance releases (foo~=2.0.0) >>>>>>>>>>> 3. allow minor feature releases (foo>=2.0.0,<3.0.0) >>>>>>>>>>> 4. latest version (foo>=2.0.0, or foo) >>>>>>>>>>> >>>>>>>>>>> seems 2 or 3 might be an acceptable solution? And, I still >>>>>>>>>>> believe we should add a disclaimer that this compatibility only >>>>>>>>>>> holds under >>>>>>>>>>> the assumption that 3rd-party packages strictly adhere to semantic >>>>>>>>>>> versioning. >>>>>>>>>>> >>>>>>>>>>> > You can totally produce a sort of 'lock' file -- uv.lock, >>>>>>>>>>> requirements.txt -- expressing a known good / recommended specific >>>>>>>>>>> resolved >>>>>>>>>>> environment. That is _not_ what Python dependency constraints are >>>>>>>>>>> for. It's >>>>>>>>>>> what env lock flies are for. >>>>>>>>>>> >>>>>>>>>>> We definitely need such a dependency list in PySpark release, >>>>>>>>>>> it's really important for users to set up a reproducible >>>>>>>>>>> environment after >>>>>>>>>>> the release several years, and this is also a good reference for >>>>>>>>>>> users who >>>>>>>>>>> encounter 3rd-party packages bugs, or battle with dependency >>>>>>>>>>> conflicts when >>>>>>>>>>> they install lots of packages in single environment. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Cheng Pan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mar 30, 2026, at 11:13, Sean Owen <[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> TL;DR Tian is more correct, and == pinning versions is not >>>>>>>>>>> achieving the desired outcome. There are other ways to do it; I >>>>>>>>>>> can't think >>>>>>>>>>> of any other Python package that works that way. This thread is >>>>>>>>>>> conflating >>>>>>>>>>> different things. >>>>>>>>>>> >>>>>>>>>>> While expressing dependence on "foo>=2.0.0" indeed can be an >>>>>>>>>>> overly-broad claim -- do you really think it works with 5.x in 10 >>>>>>>>>>> years? -- >>>>>>>>>>> expressing "foo==2.0.0" is very likely overly narrow. That says >>>>>>>>>>> "does not >>>>>>>>>>> work with any other version at all" which is likely more incorrect >>>>>>>>>>> and more >>>>>>>>>>> problematic for users. >>>>>>>>>>> >>>>>>>>>>> You can totally produce a sort of 'lock' file -- uv.lock, >>>>>>>>>>> requirements.txt -- expressing a known good / recommended specific >>>>>>>>>>> resolved >>>>>>>>>>> environment. That is _not_ what Python dependency constraints are >>>>>>>>>>> for. It's >>>>>>>>>>> what env lock flies are for. >>>>>>>>>>> >>>>>>>>>>> To be sure there is an art to figuring out the right dependency >>>>>>>>>>> bounds. A reasonable compromise is to allow maintenance releases, >>>>>>>>>>> as a >>>>>>>>>>> default when there is nothing more specific known. That is, write >>>>>>>>>>> "foo~=2.0.2" to mean ">=2.0.0 and < 2.1". >>>>>>>>>>> >>>>>>>>>>> The analogy to Scala/Java/Maven land does not quite work, partly >>>>>>>>>>> because Maven resolution is just pretty different, but mostly >>>>>>>>>>> because the >>>>>>>>>>> core Spark distribution is the 'server side' and is necessarily a >>>>>>>>>>> 'fat >>>>>>>>>>> jar', a sort of statically-compiled artifact that simply has some >>>>>>>>>>> specific >>>>>>>>>>> versions in them and can never have different versions because of >>>>>>>>>>> runtime >>>>>>>>>>> resolution differences. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Mar 29, 2026 at 10:02 PM Tian Gao via dev < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> I agree that a product must be usable first. Pinning the >>>>>>>>>>>> version (to a specific number with `==`) will make pyspark >>>>>>>>>>>> unusable. >>>>>>>>>>>> >>>>>>>>>>>> First of all, I think we can agree that many users use PySpark >>>>>>>>>>>> with other Python packages. If we conflict with other packages, >>>>>>>>>>>> `pip >>>>>>>>>>>> install -r requirements.txt` won't work. It will complain that the >>>>>>>>>>>> dependencies can't be resolved, which completely breaks our user's >>>>>>>>>>>> workflow. Even if the user locks the dependency version, it won't >>>>>>>>>>>> work. So >>>>>>>>>>>> the user had to install PySpark first, then the other packages, to >>>>>>>>>>>> override >>>>>>>>>>>> PySpark's dependency. They can't put their dependency list in a >>>>>>>>>>>> single file >>>>>>>>>>>> - that is a horrible user experience. >>>>>>>>>>>> >>>>>>>>>>>> When I look at controversial topics, I always have a strong >>>>>>>>>>>> belief, that I can't be the only smart person in the world. If an >>>>>>>>>>>> idea is >>>>>>>>>>>> good, others must already be doing it. Can we find any recognized >>>>>>>>>>>> package >>>>>>>>>>>> in the market that pins its dependencies to a specific version? >>>>>>>>>>>> The only >>>>>>>>>>>> case it works is when this package is *all* the user needs. That's >>>>>>>>>>>> why we >>>>>>>>>>>> pin versions for docker images, HTTP services, or standalone tools >>>>>>>>>>>> - users >>>>>>>>>>>> just need something that works out of the box. If we consider >>>>>>>>>>>> PySpark the >>>>>>>>>>>> dominant package - meaning that if a user employs it, it must be >>>>>>>>>>>> the most >>>>>>>>>>>> important element in their project and everything else must comply >>>>>>>>>>>> with it >>>>>>>>>>>> - pinning versions might be viable. >>>>>>>>>>>> >>>>>>>>>>>> I'm not familiar with Java dependency solutions or how users >>>>>>>>>>>> use spark with Java, but I'm familiar with the Python ecosystem and >>>>>>>>>>>> community. If we pin to a specific version, we will face >>>>>>>>>>>> significant >>>>>>>>>>>> criticism. If we must do it, at least don't make it default. Like >>>>>>>>>>>> I said >>>>>>>>>>>> above, I don't have a strong opinion about having a >>>>>>>>>>>> `pyspark[pinned]` - if >>>>>>>>>>>> users only need pyspark and no other packages they could use that. >>>>>>>>>>>> But >>>>>>>>>>>> that's extra effort for maintenance, and we need to think about >>>>>>>>>>>> what's >>>>>>>>>>>> pinned. We have a lot of pyspark install versions. >>>>>>>>>>>> >>>>>>>>>>>> Tian Gao >>>>>>>>>>>> >>>>>>>>>>>> On Sun, Mar 29, 2026 at 7:12 PM Cheng Pan <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I think the community has already reached consistence to >>>>>>>>>>>>> freeze dependencies in minor release. >>>>>>>>>>>>> >>>>>>>>>>>>> SPARK-54633 - SPIP: Accelerating Apache Spark Release Cadence >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>>>> > Clear rules for changes allowed in minor vs. major releases: >>>>>>>>>>>>> > - Dependencies are frozen and behavioral changes are >>>>>>>>>>>>> minimized in minor releases. >>>>>>>>>>>>> >>>>>>>>>>>>> I would interpret the proposed dependency policy applies to >>>>>>>>>>>>> both Java/Scala and Python dependency management for Spark. If >>>>>>>>>>>>> so, that >>>>>>>>>>>>> means PySpark will always use pinned dependencies version since >>>>>>>>>>>>> 4.3.0. But >>>>>>>>>>>>> if the intention is to only apply such a dependency policy to >>>>>>>>>>>>> Java/Scala, >>>>>>>>>>>>> then it creates a very strange situation - an extremely >>>>>>>>>>>>> conservative >>>>>>>>>>>>> dependency management strategy for Java/Scala, and an extremely >>>>>>>>>>>>> liberal one >>>>>>>>>>>>> for Python. >>>>>>>>>>>>> >>>>>>>>>>>>> To Tian Gao, >>>>>>>>>>>>> >>>>>>>>>>>>> > Pinning versions is a double-edged sword, it doesn't always >>>>>>>>>>>>> make us more secure - that's my major point. >>>>>>>>>>>>> >>>>>>>>>>>>> Product must be usable first, then security, performance, etc. >>>>>>>>>>>>> If it claims require `foo>=2.0.0`, how do you ensure it is >>>>>>>>>>>>> compatible with >>>>>>>>>>>>> foo `2.3.4`, `3.x.x`, `4.x.x`? Actually, such incompatible >>>>>>>>>>>>> failures >>>>>>>>>>>>> occurred many times, e.g.,[2]. On the contrary, if it claims >>>>>>>>>>>>> require >>>>>>>>>>>>> `foo==2.0.0`, that means it was thoroughly tested with >>>>>>>>>>>>> `foo==2.0.0`, and >>>>>>>>>>>>> users take their own risk to use it with other `foo` versions, >>>>>>>>>>>>> for exmaple, >>>>>>>>>>>>> if the `foo` strictly follow semantic version, it should work with >>>>>>>>>>>>> `foo<3.0.0`, but this is not Spark's responsibility, users should >>>>>>>>>>>>> assess >>>>>>>>>>>>> and assume the risk of incompatibility themselves. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/SPARK-54633 >>>>>>>>>>>>> [2] https://github.com/apache/spark/pull/52633 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Cheng Pan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mar 28, 2026, at 06:59, Holden Karau < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Response inline >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>> Pronouns: she/her >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 27, 2026 at 1:01 PM Nicholas Chammas < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mar 27, 2026, at 12:31 PM, Holden Karau < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> One possibility would be to make the pinned version optional >>>>>>>>>>>>>> (eg pyspark[pinned]) or publish a separate constraints file for >>>>>>>>>>>>>> people to >>>>>>>>>>>>>> optionally use with -c? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Perhaps I am misunderstanding your proposal, Holden, but this >>>>>>>>>>>>>> is possible today for people using modern Python packaging >>>>>>>>>>>>>> workflows that >>>>>>>>>>>>>> use lock files. In fact, it happens automatically; all transitive >>>>>>>>>>>>>> dependencies are pinned in the lock file, and this is by design. >>>>>>>>>>>>>> >>>>>>>>>>>>> So for someone installing a fresh venv with uv/pip/or conda >>>>>>>>>>>>> where does this come from? >>>>>>>>>>>>> >>>>>>>>>>>>> The idea here is we provide the versions we used during the >>>>>>>>>>>>> release stage so if folks want a “known safe” initial starting >>>>>>>>>>>>> point for a >>>>>>>>>>>>> new env they’ve got one. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Furthermore, it is straightforward to add additional >>>>>>>>>>>>>> restrictions to your project spec (i.e. pyproject.toml) so that >>>>>>>>>>>>>> when the >>>>>>>>>>>>>> packaging tool builds the lock file, it does it with whatever >>>>>>>>>>>>>> restrictions >>>>>>>>>>>>>> you want that are specific to your project. That could include >>>>>>>>>>>>>> specific >>>>>>>>>>>>>> versions or version ranges of libraries to exclude, for example. >>>>>>>>>>>>>> >>>>>>>>>>>>> Yes, but as it stands we leave it to the end user to start >>>>>>>>>>>>> from scratch picking these versions, we can make their lives >>>>>>>>>>>>> simpler by >>>>>>>>>>>>> providing the versions we tested against with a lock file they >>>>>>>>>>>>> can choose >>>>>>>>>>>>> to use, ignore, or update to their desired versions and include. >>>>>>>>>>>>> >>>>>>>>>>>>> Also for interactive workloads I more often see a bare >>>>>>>>>>>>> requirements file or even pip installs in nb cells (but this >>>>>>>>>>>>> could be >>>>>>>>>>>>> sample bias). >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I had to do this, for example, on a personal project that >>>>>>>>>>>>>> used PySpark Connect but which was pulling in a version of >>>>>>>>>>>>>> grpc that was generating a lot of log noise >>>>>>>>>>>>>> <https://github.com/grpc/grpc/issues/38336#issuecomment-2588422915>. >>>>>>>>>>>>>> I pinned the version of grpc in my project file and let the >>>>>>>>>>>>>> packaging tool >>>>>>>>>>>>>> resolve all the requirements across PySpark Connect and my custom >>>>>>>>>>>>>> restrictions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nick >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>> Pronouns: she/her >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >
