Nice work Appy! What do I need to do to get it wired up for branch-1.1? On Fri, May 20, 2016 at 9:25 AM, Stack <[email protected]> wrote:
> The system seems to be working nicely Appy. We are getting green precommit > builds for the first time in ages. > > Should we change the includes and excludes lists so they have a file type > ending? .txt? Then I could open them easily in the browser. Currently I > have to download them. > > Includes are tests that are currently considered 'flakey'? > > > TestGenerateDelegationToken,TestMobCompactor,TestRegionServerMetrics,TestAcidGuarantees,TestMasterReplication,TestRowProcessorEndpoint,TestAsyncLogRolling,DynamicLogicExpressionSuite,TestMasterFailoverWithProcedures,TestChoreService,TestScannerHeartbeatMessages,TestWALProcedureStore,TestRegionMergeTransactionOnCluster,TestSaslFanOutOneBlockAsyncDFSOutput,TestReplicationEndpointWithMultipleWAL > > We have a nice list. > > Excludes are: > > > **/TestGenerateDelegationToken.java,**/TestMobCompactor.java,**/TestRegionServerMetrics.java,**/TestAcidGuarantees.java,**/TestMasterReplication.java,**/TestRowProcessorEndpoint.java,**/TestAsyncLogRolling.java,**/DynamicLogicExpressionSuite.java,**/TestMasterFailoverWithProcedures.java,**/TestChoreService.java,**/TestScannerHeartbeatMessages.java,**/TestWALProcedureStore.java,**/TestRegionMergeTransactionOnCluster.java,**/TestSaslFanOutOneBlockAsyncDFSOutput.java,**/TestReplicationEndpointWithMultipleWAL.java, > > Whats the '**/' about? Is it supposed to have opening/closing versions? > > Thanks boss, > St. > > > > On Mon, May 16, 2016 at 4:45 PM, Stack <[email protected]> wrote: > > > Sweet! > > > > On Mon, May 16, 2016 at 4:38 PM, Apekshit Sharma <[email protected]> > > wrote: > > > >> This mail is to introduce the work to tackle the flaky tests in our > build. > >> > >> *Why is it important?* > >> - Our build history sucks, last 175 post-commit runs failed. We need to > >> make it useful. > >> - To better understand our code’s testing status, more importantly it’s > >> weak points. > >> - We know those 2-3 tests which keep failing every now and then, but not > >> those ~10 nasty ones which fail like 1 out of 50 times, and screw our > build. > >> - This isn’t something that can be done manually on a daily basis. We > >> need automation. > >> > >> *Changes made so far:* > >> Code changes: HBASE-15839 > >> <https://issues.apache.org/jira/browse/HBASE-15839> (Umbrella issue) > >> > >> *Jenkins changes:* > >> > >> > >> [Diagram link: > >> > https://issues.apache.org/jira/secure/attachment/12804292/Screen%20Shot%202016-05-16%20at%204.02.46%20PM.png > >> ] > >> > >> *(new job) HBase-Find-Flaky-Tests*: Gets test reports of recent builds > >> of post-commit job (TRUNK_matrix) and HBase-Flaky-Tests job (see below) > to > >> find flaky tests. Frequency of run determines how fast we catch test > >> regressions. So if we run it every 4 hours, any test which started > failing > >> in post-commit job (TRUNK_matrix) in last 4 hour will be blacklisted. > >> > >> *(new job) HBase-Flaky-Tests*: This job runs only the flaky tests. The > >> aim is to run this job back-to-back to collect as many runs as we can. > >> Higher the run rate, the better will be our system at catching the flaky > >> tests. We currently run it hourly. so we’ll be able to keep track of > flaky > >> tests with ~5% failure rate or more. > >> > >> *Post-commit (TRUNK_matrix) and pre-commit jobs*: Exclude these flaky > >> tests. > >> > >> > >> *So what if a bad commit makes a good test bad?* > >> Since the test is not bad, it’ll run in next post-commit and will fail. > >> Next run of HBase-Find-Flaky-Tests will pick it up and blacklist it. > >> Blacklisting will help keep the post-commit job and more importantly > >> pre-commit job clean, a problem we face quite often. > >> > >> *Are we just tucking away are shit?* > >> Nope, this will help us: > >> - first, Maintain a list of bad test (we lack that today). > >> - second, make our build greener to the point that a failed/red build is > >> something we worry about seriously. > >> > >> Once we are confident that the system is working fine, we’ll setup up > >> HBase-Find-Flaky-Tests job to send reports to dev@hbase so that devs > >> know about the bad tests. If it remains hidden somewhere in a jenkins > job’s > >> archive, it’s unlike that we’ll actively work on getting them fixed :). > >> > >> I'll keep posting further updates on this thread. > >> > >> -- Appy > >> > > > > >
