I love this idea. It can easily feel like bugs filed for Jenkins flakes/failures just get lost if there is no process for looking them over regularly.
I would suggest that test failures / flakes all get filed with Fix Version = whatever release is next. Then at release time we can triage the list, making sure none might be a symptom of something that should block the release. One modification to your proposal is that after manual verification that it is safe to release I would move Fix Version to the next release instead of closing, unless the issue really is fixed or otherwise not reproducible. For automation, I wonder if there's something automatic already available somewhere that would: - mark the Jenkins build to "Keep This Build Forever" - be *very* careful to try to find an existing bug, else it will be spam - file bugs to "test-failures" component - set Fix Version to the "next" - right now we have 2.7.1 (LTS), 2.11.0 (next mainline), 3.0.0 (dreamy incompatible ideas) so need the smarts to choose 2.11.0 If not, I think doing this stuff manually is not that bad, assuming we can stay fairly green. Kenn On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <sro...@google.com> wrote: > Hi All, > > There are a number of tests in our system that are either flaky or > permanently red. I am suggesting to add, if not all, then most of the tests > (style, unit, integration, etc) to the release validation step. In this > way, we will add a regular cadence to ensuring greenness and no flaky tests > in Beam. > > There are a number of ways of implementing this, but what I think might > work the best is to set up a process that either manually or automatically > creates a JIRA for the failing test and assigns it to a component tagged > with the release number. The release can then continue when all JIRAs are > closed by either fixing the failure or manually testing to ensure no > adverse side effects (this is in case there are environmental issues in the > testing infrastructure or otherwise). > > Thanks for reading, what do you think? > - Is there another, easier way to ensure that no test failures go unfixed? > - Can the process be automated? > - What am I missing? > > Regards, > Sam > >