How should we indicate whether a JIRA is a bugfix, which should be
included in the next RC, or something else that shouldn't be included
in the next RC?  Right now I think this is a somewhat manual process
with us dropping a note in the Github, or Zulip, or the person
packaging the RC using their best judgement.  Is that process working
ok?  Or would it be easier to have a label or version tag of some
kind?

On Mon, May 9, 2022 at 4:59 AM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> Hi,
>
> Thanks Raúl for bringing this up since it's an important topic!
> I'd like to provide more context for your proposal and share my
> particular problems with the release process.
>
> On Mon, May 9, 2022 at 2:33 PM Raul Cumplido <r...@voltrondata.com> wrote:
> >
> > Hi,
> >
> > I would like to propose a change in our release process.
> >
> > The rationale for the change is to avoid introducing new issues once a
> > Release Candidate has already been cut by only merging specific commits to
> > new release candidates.
> >
> > Currently once a new Release Candidate is required we drop the previous
> > version branch and create a new Release Candidate from the master branch on
> > the repository [1].
> Actually dropping the previous "release-<version>" branch is not a
> requirement, but it's indeed not clearly documented in the release
> guidelines.
>
> > This has the problem that we might introduce new bugs
> > to the Release creating the need of cutting further release candidates.
> We introduced the release branches for this exact scenario, so we can
> create releases independently from the master branch.
>
> > As an example, for the release 7.0.0, 10 release candidates were required
> The reason for the notorious 7.0.0-RC10 is different, more on that later.
>
> > and for the release 8.0.0 there was the need to remove a specific commit 
> > that
> > introduced some new issues [2]. For the release 8.0.0 we were able to find
> > it early but it could have potentially been introduced and created the need
> > for further RCs.
> >
> > I would like to propose the following workflow.
> > When creating the initial RC, create both an rc1 branch and the version
> > branch from master.
> > release-x.0.0 and release-x.0.0.rc1
> >
> > If a new RC is required, drop the release-x.0.0 (as we do today) and create
> > a new RC branch from the previous RC branch (instead of master), then
> > cherry pick only the specific commits that have been identified to be part
> > of the new release candidate. We can automate the cherrypick process via a
> > script specifying the JIRA tickets or the commit hashes that we want to add
> > to the new release candidate. Once the new RC branch is ready, create a new
> > version branch from it and proceed as today.
> This is why I manually cherry-picked 4 commits from the master branch
> to the new release branch [1] excluding that specific patch.
> Note, that there was a single blocker [2], but I still included 3
> additional patches: 2 low-risk bug fixes and a patch for the
> verification.
>
> > The commits to be added to the release once a release candidate has already
> > been cut will usually be fixes for the release but could also be features
> > if there is community consensus that a feature must be introduced to the
> > release.
> I'd have also included both the python UDF [3] and GCS [4] patches
> since they are really valuable features.
> In the first case we noticed the broken packaging builds from the
> nightly report, this is why I had to cherry-pick commits from the
> master rather than cutting RC3 directly from the master branch (there
> is no other difference).
> In the second case the PR simply didn't make it due to the same reason
> [5] which we managed to catch before merging the patch.
>
> > This change will allow us to have a more granular control of what goes in
> > the release once a release candidate has been cut and speed up the release
> Since your proposal is already implemented, the actionable item I see
> here is to properly document it in the release management guidelines.
>
> > by focusing both the release manager's and the community's efforts and
> > potentially reducing the number of RCs to be created and verified.
> Regarding the notorious 7.0.0-RC10 release candidate: I developed a
> habit to execute the source verification tasks before calling a vote
> while waiting for the packaging builds to finish. If there is an issue
> it doesn't reach the VOTE phase. Just took a look and the 6th release
> candidate (7.0.0-RC5) was the first one I managed to send out a VOTE
> email for. Out of the 11 release candidates I created for the 7.0.0
> release only 4 made it until the voting.
>
> Before that release the number of RC verification crossbow tasks kept
> growing but without the ability to run them on a nightly basis.
> Meaning that we were unable to tell whether the verification tasks
> will pass for a certain commit and just noticing issues after creating
> a release candidate.
> Right after the 7.0.0 release we refactored [6] the source
> verification scripts and crossbow tasks to support verifying specific
> git commits, local checkouts and actual release candidates. Since then
> we have nightly verification builds so we get notified about the
> failing builds and haven't even tried to create the first release
> candidate until we had failing verification tasks. This was the single
> reason why we didn't have 10+ release candidates this time.
>
>
> After spending countless sleepless nights with arrow releases I'd like
> to raise awareness of three other problems bothering me:
>
> PROBLEM 1: Rush period before the release:
> One or two weeks before the release we start to incrementally postpone
> the issues which are unlikely to make it into the release but there
> are features we would still like to squeeze in. There are too many
> simultaneously moving parts right before the release, possibly
> introducing new issues. Since we release many implementations at once
> and there are multiple stakeholders focusing on different features
> it's generally hard to "reach consensus" about what to exclude and
> what to wait for. We're trying our best to include as much value to
> each release as we can while trying to avoid significant delays in
> delivery date.
>
> PROBLEM 2: Decoupled packaging and verification builds
> Due to the on-demand nature of the crossbow tasks we often forget to
> trigger crossbow builds before merging a PR resulting in nightly
> failures which we need to fix in follow-up PRs. Ideally if we were
> able to run all of our builds on all of the PRs before merging we
> could keep the master branch in an always-relasable state.
> This is a tradeoff we made to spare CI resources for the apache/arrow
> repository but soon enough we will reach the capacity limits of
> crossbow as well (for example I had to manually stop-and-restart macOS
> crossbow builds during the release process to avoid waiting 12 more
> hours).
>
> PROBLEM 3: Lack of interest in nightly builds despite their importance
> We usually let nightly builds to continuously fail for days or even
> weeks hiding more and more issues over time. This adds up before the
> release making the rush period even worse. I'm not sure what's the
> exact reason, probably the mixture of just a few subscribers to the
> builds@ mailing list and the poor readability of nightly reports
> (which keeps improving thanks to Raúl).
>
> Thanks, Krisztian
>
> [1]: https://github.com/apache/arrow/commits/release-8.0.0
> [2]: 
> https://github.com/apache/arrow/commit/0d30a05212b1448f53233f2ab325924311d76e54
> [3]: https://github.com/apache/arrow/pull/12590
> [4]: https://github.com/apache/arrow/pull/12763
> [5]: https://github.com/apache/arrow/pull/12763#issuecomment-1109022291
> [6]: https://github.com/apache/arrow/pull/12320
> >
> > Thanks,
> > Raúl
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide
> > [2] https://github.com/apache/arrow/pull/12590#issuecomment-1116144088

Reply via email to