Re: Flaky dashboard for current branch-2

Apekshit Sharma Fri, 12 Jan 2018 15:37:14 -0800

>   Is Nightly now using a list of flakes?
Dashboard job was flaky yesterday, so didn't start using it. Looks like
it's working fine now. Let me exclude flakies from nightly job.

> Just took a look at the dashboard. Does this capture only failed runs or all
runs?
Sorry the question isn't clear. Runs of what?
Here's an attempt to answer it in best way i can understand - it looks at
last X (x=6 now) runs of nightly branch-2 to collect failing, hanging, and
timedout tests.

> I see that the following tests have failed 100% of the time for the last
30
> runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> legitimate failure, right?
> Maybe this tool is used to see all test failures, but if not, I feel like
> we could/should remove a test from the flaky tests/excludes if it fails
> consistently so we can fix the root cause

Has come up a lot of times before. Yes, you're right 100% failure =
legitimate failure.
<rant>
We as a community suck at tracking nightly runs for failing tests and
fixing them, otherwise we wouldn't have ~40 bad test, right!
In fact, we suck at fixing tests even when it's presented in a nice clean
list (this dashboard). We just don't prioritize tests in our work.
The general attitude is, tests are failing...meh..what's new, have been
failing for years. Instead of - Oh, one test failed, find the cause and
revert it!
So the real thing to change here is attitude of the community towards
tests. I am +1 for anything that'll promote/support that change.
</rant>
I think we can actually update the script to send a mail to dev@ when it
encounters these 100% failing tests. Waana try? :)

-- Appy

On Fri, Jan 12, 2018 at 11:29 AM, Zach York <[email protected]>
wrote:

> Just took a look at the dashboard. Does this capture only failed runs or
> all runs?
>
> I see that the following tests have failed 100% of the time for the last 30
> runs [1]. If this captures all runs, this isn't truly flaky, but rather a
> legitimate failure, right?
> Maybe this tool is used to see all test failures, but if not, I feel like
> we could/should remove a test from the flaky tests/excludes if it fails
> consistently so we can fix the root cause.
>
> [1]
> master.balancer.TestRegionsOnMasterOptions
> client.TestMultiParallel
> regionserver.TestRegionServerReadRequestMetrics
>
> Thanks,
> Zach
>
> On Fri, Jan 12, 2018 at 8:19 AM, Stack <[email protected]> wrote:
>
> > Dashboard doesn't capture timed out tests, right Appy?
> > Thanks,
> > S
> >
> > On Thu, Jan 11, 2018 at 6:10 PM, Apekshit Sharma <[email protected]>
> > wrote:
> >
> > > https://builds.apache.org/job/HBase-Find-Flaky-Tests-
> > > branch2.0/lastSuccessfulBuild/artifact/dashboard.html
> > >
> > > @stack: when you branch out branch-2.0, let me know, i'll update the
> jobs
> > > to point to that branch so that it's helpful in release. Once release
> is
> > > done, i'll move them back to "branch-2".
> > >
> > >
> > > -- Appy
> > >
> >
>

-- 

-- Appy

Re: Flaky dashboard for current branch-2

Reply via email to