> “ We can start by putting the bar at a lower level and raise the level over 
> time when most of the flakies that we hit are above that level.”
> My only concern is only who and how will track that.
What's Butler's logic for flagging things flaky? Maybe a "flaky low" vs. "flaky 
high" distinction based on failure frequency (or some much better name I'm sure 
someone else will come up with) could make sense? Then we could focus our 
efforts on the ones that are flagged as failing at whatever high water mark 
threshold we set.

It'd be trivial for me to update the script that parses test failure output for 
JIRA updates to flag things based on their failure frequency.

On Tue, Aug 9, 2022, at 5:24 PM, Ekaterina Dimitrova wrote:
> “ In my opinion, not all flakies are equals. Some fails every 10 runs, some 
> fails 1 in a 1000 runs.”
> Agreed, for all not new tests/regressions which are also not infra related.
> 
> “ We can start by putting the bar at a lower level and raise the level over 
> time when most of the flakies that we hit are above that level.”
> My only concern is only who and how will track that.
> Also, metric for non-infra issues I guess
> 
> “ At the same time we should make sure that we do not introduce new flakies. 
> One simple approach that has been mentioned several time is to run the new 
> tests added by a given patch in a loop using one of the CircleCI tasks. ”
> +1, I personally find this very valuable and more efficient than bisecting 
> and getting back to works done in some cases months ago
> 
> 
> “ We should also probably revert newly committed patch if we detect that they 
> introduced flakies.”
> +1, not that I like my patches to be reverted but it seems as the most fair 
> way to stick to our stated goals. But I think last time we talked about 
> reverting, we discussed it only for trunk? Or do I remember it wrong?
> 
> 
> 
> On Tue, 9 Aug 2022 at 7:58, Benjamin Lerer <ble...@apache.org> wrote:
>> At this point it is clear that we will probably never be able to remove some 
>> level of flakiness from our tests. For me the questions are: 1) Where do we 
>> draw the line for a release ? and 2) How do we maintain that line over time?
>> 
>> In my opinion, not all flakies are equals. Some fails every 10 runs, some 
>> fails 1 in a 1000 runs. I would personally draw the line based on that 
>> metric. With the circleci tasks that Andres has added we can easily get that 
>> information for a given test.
>> We can start by putting the bar at a lower level and raise the level over 
>> time when most of the flakies that we hit are above that level.
>> 
>> TThat would allow us to minimize the risk of introducing flaky tests. We 
>> should also probably revert newly committed patch if we detect that they 
>> introduced flakies.
>> 
>> What do you think?
>> 
>> 
>> 
>> 
>> 
>> Le dim. 7 août 2022 à 12:24, Mick Semb Wever <m...@apache.org> a écrit :
>>> 
>>> 
>>>> With that said, I guess we can just revise on a regular basis what exactly 
>>>> are the last flakes and not numbers which also change quickly up and down 
>>>> with the first change in the Infra. 
>>>> 
>>> 
>>> 
>>> +1, I am in favour of taking a pragmatic approach.
>>> 
>>> If flakies are identified and triaged enough that, with correlation from 
>>> both CI systems, we are confident that no legit bugs are behind them, I'm 
>>> in favour of going beta.
>>> 
>>> I still remain in favour of somehow incentivising reducing other flakies as 
>>> well. Flakies that expose poor/limited CI infra, and/or tests that are not 
>>> as resilient as they could be, are still noise that indirectly reduce our 
>>> QA (and increase efforts to find and tackle those legit runtime problems). 
>>> Interested in hearing input from others here that have been spending a lot 
>>> of time on this front. 
>>> 
>>> Could it work if we say: all flakies must be ticketed, and test/infra 
>>> related flakies do not block a beta release so long as there are fewer than 
>>> the previous release? The intent here being pragmatic, but keeping us on a 
>>> "keep the campground cleaner" trajectory… 

Reply via email to