Sweet! On Mon, May 16, 2016 at 4:38 PM, Apekshit Sharma <[email protected]> wrote:
> This mail is to introduce the work to tackle the flaky tests in our build. > > *Why is it important?* > - Our build history sucks, last 175 post-commit runs failed. We need to > make it useful. > - To better understand our code’s testing status, more importantly it’s > weak points. > - We know those 2-3 tests which keep failing every now and then, but not > those ~10 nasty ones which fail like 1 out of 50 times, and screw our build. > - This isn’t something that can be done manually on a daily basis. We need > automation. > > *Changes made so far:* > Code changes: HBASE-15839 > <https://issues.apache.org/jira/browse/HBASE-15839> (Umbrella issue) > > *Jenkins changes:* > > > [Diagram link: > https://issues.apache.org/jira/secure/attachment/12804292/Screen%20Shot%202016-05-16%20at%204.02.46%20PM.png > ] > > *(new job) HBase-Find-Flaky-Tests*: Gets test reports of recent builds of > post-commit job (TRUNK_matrix) and HBase-Flaky-Tests job (see below) to > find flaky tests. Frequency of run determines how fast we catch test > regressions. So if we run it every 4 hours, any test which started failing > in post-commit job (TRUNK_matrix) in last 4 hour will be blacklisted. > > *(new job) HBase-Flaky-Tests*: This job runs only the flaky tests. The > aim is to run this job back-to-back to collect as many runs as we can. > Higher the run rate, the better will be our system at catching the flaky > tests. We currently run it hourly. so we’ll be able to keep track of flaky > tests with ~5% failure rate or more. > > *Post-commit (TRUNK_matrix) and pre-commit jobs*: Exclude these flaky > tests. > > > *So what if a bad commit makes a good test bad?* > Since the test is not bad, it’ll run in next post-commit and will fail. > Next run of HBase-Find-Flaky-Tests will pick it up and blacklist it. > Blacklisting will help keep the post-commit job and more importantly > pre-commit job clean, a problem we face quite often. > > *Are we just tucking away are shit?* > Nope, this will help us: > - first, Maintain a list of bad test (we lack that today). > - second, make our build greener to the point that a failed/red build is > something we worry about seriously. > > Once we are confident that the system is working fine, we’ll setup up > HBase-Find-Flaky-Tests job to send reports to dev@hbase so that devs know > about the bad tests. If it remains hidden somewhere in a jenkins job’s > archive, it’s unlike that we’ll actively work on getting them fixed :). > > I'll keep posting further updates on this thread. > > -- Appy >
