How should we indicate whether a JIRA is a bugfix, which should be included in the next RC, or something else that shouldn't be included in the next RC? Right now I think this is a somewhat manual process with us dropping a note in the Github, or Zulip, or the person packaging the RC using their best judgement. Is that process working ok? Or would it be easier to have a label or version tag of some kind?
On Mon, May 9, 2022 at 4:59 AM Krisztián Szűcs <szucs.kriszt...@gmail.com> wrote: > > Hi, > > Thanks Raúl for bringing this up since it's an important topic! > I'd like to provide more context for your proposal and share my > particular problems with the release process. > > On Mon, May 9, 2022 at 2:33 PM Raul Cumplido <r...@voltrondata.com> wrote: > > > > Hi, > > > > I would like to propose a change in our release process. > > > > The rationale for the change is to avoid introducing new issues once a > > Release Candidate has already been cut by only merging specific commits to > > new release candidates. > > > > Currently once a new Release Candidate is required we drop the previous > > version branch and create a new Release Candidate from the master branch on > > the repository [1]. > Actually dropping the previous "release-<version>" branch is not a > requirement, but it's indeed not clearly documented in the release > guidelines. > > > This has the problem that we might introduce new bugs > > to the Release creating the need of cutting further release candidates. > We introduced the release branches for this exact scenario, so we can > create releases independently from the master branch. > > > As an example, for the release 7.0.0, 10 release candidates were required > The reason for the notorious 7.0.0-RC10 is different, more on that later. > > > and for the release 8.0.0 there was the need to remove a specific commit > > that > > introduced some new issues [2]. For the release 8.0.0 we were able to find > > it early but it could have potentially been introduced and created the need > > for further RCs. > > > > I would like to propose the following workflow. > > When creating the initial RC, create both an rc1 branch and the version > > branch from master. > > release-x.0.0 and release-x.0.0.rc1 > > > > If a new RC is required, drop the release-x.0.0 (as we do today) and create > > a new RC branch from the previous RC branch (instead of master), then > > cherry pick only the specific commits that have been identified to be part > > of the new release candidate. We can automate the cherrypick process via a > > script specifying the JIRA tickets or the commit hashes that we want to add > > to the new release candidate. Once the new RC branch is ready, create a new > > version branch from it and proceed as today. > This is why I manually cherry-picked 4 commits from the master branch > to the new release branch [1] excluding that specific patch. > Note, that there was a single blocker [2], but I still included 3 > additional patches: 2 low-risk bug fixes and a patch for the > verification. > > > The commits to be added to the release once a release candidate has already > > been cut will usually be fixes for the release but could also be features > > if there is community consensus that a feature must be introduced to the > > release. > I'd have also included both the python UDF [3] and GCS [4] patches > since they are really valuable features. > In the first case we noticed the broken packaging builds from the > nightly report, this is why I had to cherry-pick commits from the > master rather than cutting RC3 directly from the master branch (there > is no other difference). > In the second case the PR simply didn't make it due to the same reason > [5] which we managed to catch before merging the patch. > > > This change will allow us to have a more granular control of what goes in > > the release once a release candidate has been cut and speed up the release > Since your proposal is already implemented, the actionable item I see > here is to properly document it in the release management guidelines. > > > by focusing both the release manager's and the community's efforts and > > potentially reducing the number of RCs to be created and verified. > Regarding the notorious 7.0.0-RC10 release candidate: I developed a > habit to execute the source verification tasks before calling a vote > while waiting for the packaging builds to finish. If there is an issue > it doesn't reach the VOTE phase. Just took a look and the 6th release > candidate (7.0.0-RC5) was the first one I managed to send out a VOTE > email for. Out of the 11 release candidates I created for the 7.0.0 > release only 4 made it until the voting. > > Before that release the number of RC verification crossbow tasks kept > growing but without the ability to run them on a nightly basis. > Meaning that we were unable to tell whether the verification tasks > will pass for a certain commit and just noticing issues after creating > a release candidate. > Right after the 7.0.0 release we refactored [6] the source > verification scripts and crossbow tasks to support verifying specific > git commits, local checkouts and actual release candidates. Since then > we have nightly verification builds so we get notified about the > failing builds and haven't even tried to create the first release > candidate until we had failing verification tasks. This was the single > reason why we didn't have 10+ release candidates this time. > > > After spending countless sleepless nights with arrow releases I'd like > to raise awareness of three other problems bothering me: > > PROBLEM 1: Rush period before the release: > One or two weeks before the release we start to incrementally postpone > the issues which are unlikely to make it into the release but there > are features we would still like to squeeze in. There are too many > simultaneously moving parts right before the release, possibly > introducing new issues. Since we release many implementations at once > and there are multiple stakeholders focusing on different features > it's generally hard to "reach consensus" about what to exclude and > what to wait for. We're trying our best to include as much value to > each release as we can while trying to avoid significant delays in > delivery date. > > PROBLEM 2: Decoupled packaging and verification builds > Due to the on-demand nature of the crossbow tasks we often forget to > trigger crossbow builds before merging a PR resulting in nightly > failures which we need to fix in follow-up PRs. Ideally if we were > able to run all of our builds on all of the PRs before merging we > could keep the master branch in an always-relasable state. > This is a tradeoff we made to spare CI resources for the apache/arrow > repository but soon enough we will reach the capacity limits of > crossbow as well (for example I had to manually stop-and-restart macOS > crossbow builds during the release process to avoid waiting 12 more > hours). > > PROBLEM 3: Lack of interest in nightly builds despite their importance > We usually let nightly builds to continuously fail for days or even > weeks hiding more and more issues over time. This adds up before the > release making the rush period even worse. I'm not sure what's the > exact reason, probably the mixture of just a few subscribers to the > builds@ mailing list and the poor readability of nightly reports > (which keeps improving thanks to Raúl). > > Thanks, Krisztian > > [1]: https://github.com/apache/arrow/commits/release-8.0.0 > [2]: > https://github.com/apache/arrow/commit/0d30a05212b1448f53233f2ab325924311d76e54 > [3]: https://github.com/apache/arrow/pull/12590 > [4]: https://github.com/apache/arrow/pull/12763 > [5]: https://github.com/apache/arrow/pull/12763#issuecomment-1109022291 > [6]: https://github.com/apache/arrow/pull/12320 > > > > Thanks, > > Raúl > > > > [1] > > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide > > [2] https://github.com/apache/arrow/pull/12590#issuecomment-1116144088