Re: Smart Flaky Handler

Nick Dimiduk Fri, 20 May 2016 11:05:06 -0700

Nice work Appy! What do I need to do to get it wired up for branch-1.1?

On Fri, May 20, 2016 at 9:25 AM, Stack <[email protected]> wrote:


> The system seems to be working nicely Appy. We are getting green precommit
> builds for the first time in ages.
>
> Should we change the includes and excludes lists so they have a file type
> ending? .txt? Then I could open them easily in the browser. Currently I
> have to download them.
>
> Includes are tests that are currently considered 'flakey'?
>
>
> TestGenerateDelegationToken,TestMobCompactor,TestRegionServerMetrics,TestAcidGuarantees,TestMasterReplication,TestRowProcessorEndpoint,TestAsyncLogRolling,DynamicLogicExpressionSuite,TestMasterFailoverWithProcedures,TestChoreService,TestScannerHeartbeatMessages,TestWALProcedureStore,TestRegionMergeTransactionOnCluster,TestSaslFanOutOneBlockAsyncDFSOutput,TestReplicationEndpointWithMultipleWAL
>
> We have a nice list.
>
> Excludes are:
>
>
> **/TestGenerateDelegationToken.java,**/TestMobCompactor.java,**/TestRegionServerMetrics.java,**/TestAcidGuarantees.java,**/TestMasterReplication.java,**/TestRowProcessorEndpoint.java,**/TestAsyncLogRolling.java,**/DynamicLogicExpressionSuite.java,**/TestMasterFailoverWithProcedures.java,**/TestChoreService.java,**/TestScannerHeartbeatMessages.java,**/TestWALProcedureStore.java,**/TestRegionMergeTransactionOnCluster.java,**/TestSaslFanOutOneBlockAsyncDFSOutput.java,**/TestReplicationEndpointWithMultipleWAL.java,
>
> Whats the '**/' about? Is it supposed to have opening/closing versions?
>
> Thanks boss,
> St.
>
>
>
> On Mon, May 16, 2016 at 4:45 PM, Stack <[email protected]> wrote:
>
> > Sweet!
> >
> > On Mon, May 16, 2016 at 4:38 PM, Apekshit Sharma <[email protected]>
> > wrote:
> >
> >> This mail is to introduce the work to tackle the flaky tests in our
> build.
> >>
> >> *Why is it important?*
> >> - Our build history sucks, last 175 post-commit runs failed. We need to
> >> make it useful.
> >> - To better understand our code’s testing status, more importantly it’s
> >> weak points.
> >> - We know those 2-3 tests which keep failing every now and then, but not
> >> those ~10 nasty ones which fail like 1 out of 50 times, and screw our
> build.
> >> - This isn’t something that can be done manually on a daily basis. We
> >> need automation.
> >>
> >> *Changes made so far:*
> >> Code changes: HBASE-15839
> >> <https://issues.apache.org/jira/browse/HBASE-15839>  (Umbrella issue)
> >>
> >> *Jenkins changes:*
> >>
> >>
> >> [Diagram link:
> >>
> https://issues.apache.org/jira/secure/attachment/12804292/Screen%20Shot%202016-05-16%20at%204.02.46%20PM.png
> >> ]
> >> 
> >> *(new job) HBase-Find-Flaky-Tests*: Gets test reports of recent builds
> >> of post-commit job (TRUNK_matrix) and HBase-Flaky-Tests job (see below)
> to
> >> find flaky tests. Frequency of run determines how fast we catch test
> >> regressions. So if we run it every 4 hours, any test which started
> failing
> >> in post-commit job (TRUNK_matrix) in last 4 hour will be blacklisted.
> >>
> >> *(new job) HBase-Flaky-Tests*: This job runs only the flaky tests. The
> >> aim is to run this job back-to-back to collect as many runs as we can.
> >> Higher the run rate, the better will be our system at catching the flaky
> >> tests. We currently run it hourly. so we’ll be able to keep track of
> flaky
> >> tests with ~5% failure rate or more.
> >>
> >> *Post-commit (TRUNK_matrix) and pre-commit jobs*: Exclude these flaky
> >> tests.
> >>
> >>
> >> *So what if a bad commit makes a good test bad?*
> >> Since the test is not bad, it’ll run in next post-commit and will fail.
> >> Next run of HBase-Find-Flaky-Tests will  pick it up and blacklist it.
> >> Blacklisting will help keep the post-commit job and more importantly
> >> pre-commit job clean, a problem we face quite often.
> >>
> >> *Are we just tucking away are shit?*
> >> Nope, this will help us:
> >> - first, Maintain a list of bad test (we lack that today).
> >> - second, make our build greener to the point that a failed/red build is
> >> something we worry about seriously.
> >>
> >> Once we are confident that the system is working fine, we’ll setup up
> >> HBase-Find-Flaky-Tests job to send reports to dev@hbase so that devs
> >> know about the bad tests. If it remains hidden somewhere in a jenkins
> job’s
> >> archive, it’s unlike that we’ll actively work on getting them fixed :).
> >>
> >> I'll keep posting further updates on this thread.
> >>
> >> -- Appy
> >>
> >
> >
>

Re: Smart Flaky Handler

Reply via email to