+1 for removing flaky category and fix as failure occurs. On Thu, Jul 5, 2018 at 8:21 PM Dan Smith <dsm...@pivotal.io> wrote:
> Honestly I've never liked the flaky category. What it means is that at some > point in the past, we decided to put off tracking down and fixing a failure > and now we're left with a bug number and a description and that's it. > > I think we will be better off if we just get rid of the flaky category > entirely. That way no one can label anything else as flaky and push it off > for later, and if flaky tests fail again we will actually prioritize and > fix them instead of ignoring them. > > I think Patrick was looking at rerunning the flaky tests to see what is > still failing. How about we just run the whole flaky suite some number of > times (100?), fix whatever is still failing and close out and remove the > category from the rest? > > I think will we get more benefit from shaking out and fixing the issues we > have in the current codebase than we will from carefully explaining the > flaky failures from the past. > > -Dan > > On Thu, Jul 5, 2018 at 7:03 PM, Dale Emery <dem...@pivotal.io> wrote: > > > Hi Alexander and all, > > > > > On Jul 5, 2018, at 5:11 PM, Alexander Murmann <amurm...@pivotal.io> > > wrote: > > > > > > Hi everyone! > > > > > > Dan Smith started a discussion about shaking out more flaky DUnit > tests. > > > That's a great effort and I am happy it's happening. > > > > > > As a corollary to that conversation I wonder what the criteria should > be > > > for a test to not be considered flaky any longer and have the category > > > removed. > > > > > > In general the bar should be fairly high. Even if a test only fails ~1 > in > > > 500 runs that's still a problem given how many tests we have. > > > > > > I see two ends of the spectrum: > > > 1. We have a good understanding why the test was flaky and think we > fixed > > > it. > > > 2. We have a hard time reproducing the flaky behavior and have no good > > > theory as to why the test might have shown flaky behavior. > > > > > > In the first case I'd suggest to run the test ~100 times to get a > little > > > more confidence that we fixed the flaky behavior and then remove the > > > category. > > > > Here’s a test for case 1: > > > > If we really understand why it was flaky, we will be able to: > > - Identify the “faults”—the broken places in the code (whether system > > code or test code). > > - Identify the exact conditions under which those faults led to the > > failures we observed. > > - Explain how those faults, under those conditions. led to those > > failures. > > - Run unit tests that exercise the code under those same conditions, > > and demonstrate that > > the formerly broken code now does the right thing. > > > > If we’re lacking any of these things, I’d say we’re dealing with case 2. > > > > > The second case is a lot more problematic. How often do we want to run > a > > > test like that before we decide that it might have been fixed since we > > last > > > saw it happen? Anything else we could/should do to verify the test > > deserves > > > our trust again? > > > > > > I would want a clear, compelling explanation of the failures we observed. > > > > Clear and compelling are subjective, of course. For me, clear and > > compelling would include > > descriptions of: > > - The faults in the code. What, specifically, was broken. > > - The specific conditions under which the code did the wrong thing. > > - How those faults, under those conditions, led to those failures. > > - How the fix either prevents those conditions, or causes the formerly > > broken code to > > now do the right thing. > > > > Even if we don’t have all of these elements, we may have some of them. > > That can help us > > calibrate our confidence. But the elements work together. If we’re > lacking > > one, the others > > are shaky, to some extent. > > > > The more elements are missing in our explanation, the more times I’d want > > to run the test > > before trusting it. > > > > Cheers, > > Dale > > > > — > > Dale Emery > > dem...@pivotal.io > > > > > -- Cheers Jinmei