This is a a good idea. Some suggestions: - It would be nicer if we can figure out process to act on flaky test more frequently than releases. - Another improvement in the process would be having actual owners of issues rather than auto assigned component owners. A few folks have 100+ assigned issues. Unassigning those issues, and finding owners who would have time to work on identified flaky tests would be helpful.
On Mon, Jan 7, 2019 at 3:45 PM Kenneth Knowles <k...@apache.org> wrote: > I love this idea. It can easily feel like bugs filed for Jenkins > flakes/failures just get lost if there is no process for looking them over > regularly. > > I would suggest that test failures / flakes all get filed with Fix Version > = whatever release is next. Then at release time we can triage the list, > making sure none might be a symptom of something that should block the > release. One modification to your proposal is that after manual > verification that it is safe to release I would move Fix Version to the > next release instead of closing, unless the issue really is fixed or > otherwise not reproducible. > > For automation, I wonder if there's something automatic already available > somewhere that would: > > - mark the Jenkins build to "Keep This Build Forever" > - be *very* careful to try to find an existing bug, else it will be spam > - file bugs to "test-failures" component > - set Fix Version to the "next" - right now we have 2.7.1 (LTS), 2.11.0 > (next mainline), 3.0.0 (dreamy incompatible ideas) so need the smarts to > choose 2.11.0 > > If not, I think doing this stuff manually is not that bad, assuming we can > stay fairly green. > > Kenn > > On Mon, Jan 7, 2019 at 3:20 PM Sam Rohde <sro...@google.com> wrote: > >> Hi All, >> >> There are a number of tests in our system that are either flaky or >> permanently red. I am suggesting to add, if not all, then most of the tests >> (style, unit, integration, etc) to the release validation step. In this >> way, we will add a regular cadence to ensuring greenness and no flaky tests >> in Beam. >> >> There are a number of ways of implementing this, but what I think might >> work the best is to set up a process that either manually or automatically >> creates a JIRA for the failing test and assigns it to a component tagged >> with the release number. The release can then continue when all JIRAs are >> closed by either fixing the failure or manually testing to ensure no >> adverse side effects (this is in case there are environmental issues in the >> testing infrastructure or otherwise). >> >> Thanks for reading, what do you think? >> - Is there another, easier way to ensure that no test failures go unfixed? >> - Can the process be automated? >> - What am I missing? >> >> Regards, >> Sam >> >>