Re: Smart Flaky Handler

Stack Mon, 16 May 2016 16:46:26 -0700

Sweet!

On Mon, May 16, 2016 at 4:38 PM, Apekshit Sharma <[email protected]> wrote:


> This mail is to introduce the work to tackle the flaky tests in our build.
>
> *Why is it important?*
> - Our build history sucks, last 175 post-commit runs failed. We need to
> make it useful.
> - To better understand our code’s testing status, more importantly it’s
> weak points.
> - We know those 2-3 tests which keep failing every now and then, but not
> those ~10 nasty ones which fail like 1 out of 50 times, and screw our build.
> - This isn’t something that can be done manually on a daily basis. We need
> automation.
>
> *Changes made so far:*
> Code changes: HBASE-15839
> <https://issues.apache.org/jira/browse/HBASE-15839>  (Umbrella issue)
>
> *Jenkins changes:*
>
>
> [Diagram link:
> https://issues.apache.org/jira/secure/attachment/12804292/Screen%20Shot%202016-05-16%20at%204.02.46%20PM.png
> ]
> 
> *(new job) HBase-Find-Flaky-Tests*: Gets test reports of recent builds of
> post-commit job (TRUNK_matrix) and HBase-Flaky-Tests job (see below) to
> find flaky tests. Frequency of run determines how fast we catch test
> regressions. So if we run it every 4 hours, any test which started failing
> in post-commit job (TRUNK_matrix) in last 4 hour will be blacklisted.
>
> *(new job) HBase-Flaky-Tests*: This job runs only the flaky tests. The
> aim is to run this job back-to-back to collect as many runs as we can.
> Higher the run rate, the better will be our system at catching the flaky
> tests. We currently run it hourly. so we’ll be able to keep track of flaky
> tests with ~5% failure rate or more.
>
> *Post-commit (TRUNK_matrix) and pre-commit jobs*: Exclude these flaky
> tests.
>
>
> *So what if a bad commit makes a good test bad?*
> Since the test is not bad, it’ll run in next post-commit and will fail.
> Next run of HBase-Find-Flaky-Tests will  pick it up and blacklist it.
> Blacklisting will help keep the post-commit job and more importantly
> pre-commit job clean, a problem we face quite often.
>
> *Are we just tucking away are shit?*
> Nope, this will help us:
> - first, Maintain a list of bad test (we lack that today).
> - second, make our build greener to the point that a failed/red build is
> something we worry about seriously.
>
> Once we are confident that the system is working fine, we’ll setup up
> HBase-Find-Flaky-Tests job to send reports to dev@hbase so that devs know
> about the bad tests. If it remains hidden somewhere in a jenkins job’s
> archive, it’s unlike that we’ll actively work on getting them fixed :).
>
> I'll keep posting further updates on this thread.
>
> -- Appy
>

Re: Smart Flaky Handler

Reply via email to