Agreed, there are multiple issues to resolve in order for our release process to be manageable and scalable for the project. This procedural change is not a silver bullet, and if we agree to it, it doesn't mean that our releases are "fixed". But it's the only change where the solution is a discussion and vote, not a JIRA and pull request.
Neal On Tue, Jan 19, 2021 at 6:18 PM Wes McKinney <wesmck...@gmail.com> wrote: > I'm OK with moving to source only releases, but we need to take a step > back and consider how our CI/CD is failing to notify us in a suitably > timely and automated way about the packages being broken. For example, > the fact that we had 2 failed RCs as the result of packaging issues > points to a broken process. > > So there are a couple issues at play: > > * The act of _producing_ the package artifacts should not stop a > release vote from proceeding like it does now (the "12 hours" you > refer to that's caused by slow iteration time with Crossbow — this is > also a problem, can we not fix this?) > * We need a better feedback loop to determine whether master is in a > releasable state, including all relevant packages > > If we commit ourselves to solving one problem but not both, I fear > that we will find ourselves suffering from other kinds of problems in > future release cycles > > On Tue, Jan 19, 2021 at 5:16 PM Neal Richardson > <neal.p.richard...@gmail.com> wrote: > > > > Hi all, > > Over the past year, there's been a lot of discussion around the > challenges > > we face as a project in doing releases. Because they are costly to do, we > > don't do them often; because we don't do them often, they become even > > costlier. > > > > There are only a small number of people (PMC members with GPG keys > > registered with ASF) who could possibly be release manager, and because > of > > the amount of time required (I saw Krisztián say on the 3.0 release > thread > > something like "I'll start a new rc, it'll be done in 12 hours), even > fewer > > people could be expected to take on the burden. Indeed, this is > Krisztián's > > 10th release in a row as release manager, and over the course of the > > project, 2/3 of all release candidates have been made by just 2 people. > > > > I'd like to propose a change to our release procedure: instead of having > > the release candidate vote include Python wheels, Linux system packages, > or > > any other binary packages, we should only vote on the source release. > > Binary artifacts would be produced as post-release tasks, using the > > official source release. > > > > This would greatly reduce the time and effort it takes to produce a > release > > candidate--tar, sign, and upload, that's it--and it would remove a bunch > of > > points of failure from the release-candidate making process (timeouts, CI > > flakiness, etc.). It would also mean fewer release-blocking issues--we > > still have to fix the packaging builds, but doing so can happen in > parallel > > with the verification process. If we found problems in the packaging > > scripts, fixes could either be applied as patch steps to the binary > > artifact build scripts, or if fixes can be produced quickly, we collect > > them and cut another (cheap) release candidate. Right now, our only > option > > is the latter, which makes for a slow, stressful release process where > > there are so many places where a simple issue can block the whole release > > or set us back an additional week (a full day to produce a release > > candidate plus another three to vote). > > > > If we went this direction, we could still choose to vote separately on > > binary packages like wheels, though I'm not sure that's worth the effort. > > Many of the packages that people use (conda, homebrew, CRAN, etc.) are > > already "unofficial" releases because they're packaged by someone else, > and > > I don't think the distinction is meaningful to our users. > > > > To be clear, this doesn't reduce the general maintenance burden of the > > project. We still have to monitor nightly builds, fix packaging scripts > > that break, and deal with CI service interruptions. This change would > just > > reduce the burden on the release manager and allow us to spread more > > broadly the costs of packaging and releasing. It also solves questions > such > > as "Why should the Rust release be blocked just because we're having a > > problem building Python wheels on macOS?" > > > > There are also other things we could do that would, on a technical level, > > improve our ability to make releases more efficiently. Andy Grove's > change > > in the use of maven in the release process will help, as would a number > of > > CI/CD improvements. I view these as complementary to this proposal, which > > is a governance question with technical/logistical implications. > > > > Thoughts? > > > > Neal >