Perhaps flaky tests need to be handled differently.  Is there a way to
build a statistical model of the current flakiness of the test that we can
then use during testing to accept the failures?  So if an acceptable level
of flakiness is developed then if the test fails, it needs to be run again
or multiple times to get a sample and ensure that the failure is not
statistically significant.



On Wed, Aug 10, 2022 at 8:51 AM Benedict Elliott Smith <bened...@apache.org>
wrote:

> 
> > We can start by putting the bar at a lower level and raise the level
> over time
>
> +1
>
> > One simple approach that has been mentioned several time is to run the
> new tests added by a given patch in a loop using one of the CircleCI tasks
>
> I think if we want to do this, it should be extremely easy - by which I
> mean automatic, really. This shouldn’t be too tricky I think? We just need
> to produce a diff of new test classes and methods within existing classes.
> If there doesn’t already exist tooling to do this, I can probably help out
> by putting together something to output @Test annotated methods within a
> source tree, if others are able to turn this into a part of the CircleCI
> pre-commit task (i.e. to pick the common ancestor with trunk, 4.1 etc, and
> run this task for each of the outputs). We might want to start
> standardising branch naming structures to support picking the upstream
> branch.
>
> > We should also probably revert newly committed patch if we detect that
> they introduced flakies.
>
> There should be a strict time limit for reverting a patch for this reason,
> as environments change and what is flaky now was not necessarily before.
>
> On 9 Aug 2022, at 12:57, Benjamin Lerer <ble...@apache.org> wrote:
>
> At this point it is clear that we will probably never be able to remove
> some level of flakiness from our tests. For me the questions are: 1) Where
> do we draw the line for a release ? and 2) How do we maintain that line
> over time?
>
> In my opinion, not all flakies are equals. Some fails every 10 runs, some
> fails 1 in a 1000 runs. I would personally draw the line based on that
> metric. With the circleci tasks that Andres has added we can easily get
> that information for a given test.
> We can start by putting the bar at a lower level and raise the level over
> time when most of the flakies that we hit are above that level.
>
> At the same time we should make sure that we do not introduce new flakies.
> One simple approach that has been mentioned several time is to run the new
> tests added by a given patch in a loop using one of the CircleCI tasks.
> That would allow us to minimize the risk of introducing flaky tests. We
> should also probably revert newly committed patch if we detect that they
> introduced flakies.
>
> What do you think?
>
>
>
>
>
> Le dim. 7 août 2022 à 12:24, Mick Semb Wever <m...@apache.org> a écrit :
>
>>
>>
>> With that said, I guess we can just revise on a regular basis what
>>> exactly are the last flakes and not numbers which also change quickly up
>>> and down with the first change in the Infra.
>>>
>>
>>
>> +1, I am in favour of taking a pragmatic approach.
>>
>> If flakies are identified and triaged enough that, with correlation from
>> both CI systems, we are confident that no legit bugs are behind them, I'm
>> in favour of going beta.
>>
>> I still remain in favour of somehow incentivising reducing other flakies
>> as well. Flakies that expose poor/limited CI infra, and/or tests that are
>> not as resilient as they could be, are still noise that indirectly reduce
>> our QA (and increase efforts to find and tackle those legit runtime
>> problems). Interested in hearing input from others here that have been
>> spending a lot of time on this front.
>>
>> Could it work if we say: all flakies must be ticketed, and test/infra
>> related flakies do not block a beta release so long as there are fewer than
>> the previous release? The intent here being pragmatic, but keeping us on a
>> "keep the campground cleaner" trajectory…
>>
>>
>

Reply via email to