Similar to Enis's comments -- it is not just the new tests running N times -- it is that sometimes they leave junk behind that pollutes other tests. I'm currently having similar problems on the trunk+snapshot branch.
I've also been running trunk+snapshot and there are some tests that seem test broken there. (I had started on some of them a while back, probably time to get back to them). Jon. On Wed, Dec 26, 2012 at 8:05 PM, Andrew Purtell <[email protected]> wrote: > Hmm... How about just adding to the contributor section that new tests > should run reliably N times locally. N=10? N=20? N=100? > > > On Wed, Dec 26, 2012 at 12:02 PM, Enis Söztutar <[email protected]> wrote: > >> Just a reference of some of the recent efforts that went in: >> HBASE-7432 TestHBaseFsck prevents testsuite from finishing >> HBASE-7431 TestSplitTransactionOnCluster tests still flaky >> HBASE-7417 Test patch, hopefully fixes TestReplication >> HBASE-7421 TestHFileCleaner->testHFileCleaning has an aggressive >> timeout >> HBASE-7398 [0.94 UNIT TESTS] TestAssignmentManager fails frequently on >> CentOS 5 >> HBASE-7338 Fix flaky condition for >> >> org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange >> HBASE-6175 TestFSUtils flaky on hdfs getFileStatus method >> HBASE-7343 Fix flaky condition for TestDrainingServer (Himanshu) >> HBASE-7301 Force ipv4 for unit tests >> HBASE-7300 HbckTestingUtil needs to keep a static executor to lower >> the number of threads used >> HBASE-6206 Large tests fail with jdk1.7 >> HBASE-7252 TestSizeBasedThrottler fails occasionally >> HBASE-7235 TestMasterObserver is flaky >> HBASE-7172 TestSplitLogManager.testVanishingTaskZNode() fails when run >> individually and is flaky >> HBASE-7177 TestZooKeeperScanPolicyObserver.testScanPolicyObserver is >> flaky >> HBASE-7166 TestSplitTransactionOnCluster tests are flaky >> HBASE-7165 TestSplitLogManager.testUnassignedTimeout is flaky >> HBASE-5984 TestLogRolling.testLogRollOnPipelineRestart failed with >> HADOOP 2.0.0 >> HBASE-7142 TestSplitLogManager#testDeadWorker may fail because of hard >> limit on the TimeoutMonitor's timeout period (Himanshu) >> HBASE-7143 TestMetaMigrationRemovingHTD fails when used with Hadoop >> 0.23/2.x (Andrey Klochlov) >> HBASE-6958 TestAssignmentManager sometimes fails >> HBASE-6305 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds. >> (Himanshu) >> HBASE-6796 ADDENDUM, remove spurious time limit from testHFileCleaning >> HBASE-6852, REVERT again, due to unexplained test failures that only >> occur on the jenkins machines >> HBASE-7077 ADDENDUM, add TestCategory >> HBASE-6733 TestReplication.queueFailover occasionally fails [Part-2] >> HBASE-6906 TestHBaseFsck#testQuarantine* tests are flakey due to >> TestNotEnabledException >> HBASE-6784 TestCoprocessorScanPolicy is sometimes flaky when run >> locally >> HBASE-6714 TestMultiSlaveReplication#testMultiSlaveReplication may fail >> HBASE-6715 TestFromClientSide.testCacheOnWriteEvictOnClose is flaky >> >> >> Please keep these in mind, when you are writing a new test. >> Enis >> >> >> On Wed, Dec 26, 2012 at 10:03 AM, Stack <[email protected]> wrote: >> >> > I just added a section to the 'contributing' section on committers being >> > responsible for ensuring contributor's patches do not break build or >> tests. >> > St.Ack >> > >> > >> > On Wed, Dec 26, 2012 at 9:08 AM, Stack <[email protected]> wrote: >> > >> > > Or there is a submitting patches section: >> > > http://hbase.apache.org/book.html#submitting.patches >> > > St.Ack >> > > >> > > >> > > On Wed, Dec 26, 2012 at 8:53 AM, Stack <[email protected]> wrote: >> > > >> > >> Thanks for doing the fixup "Iron Hand". +1 on these rules for a >> branch >> > >> or for any branch (We'll have to do the same for for trunk when it >> > becomes >> > >> 0.96 branch). Should we add something here: >> > >> http://hbase.apache.org/book.html#hbase.tests Or to the community >> > >> section: http://hbase.apache.org/book.html#community ? Or to the >> > >> developer section? >> > >> >> > >> St.Ack >> > >> >> > >> >> > >> On Tue, Dec 25, 2012 at 11:57 AM, lars hofhansl <[email protected] >> > >wrote: >> > >> >> > >>> During the past few days I spend some time to bring the 0.94 test >> back >> > >>> into shape. >> > >>> >> > >>> GC issues, bad backports, hanging tests, memory issues, you name it. >> > >>> I do not want to ever have to do that again. >> > >>> >> > >>> The good news is: The 0.94 tests are back in shape now. Yeah! >> > >>> >> > >>> If you commit a patch it is your responsibility to make sure it >> passes >> > >>> the test suite. >> > >>> Either the tests should be fixed in a reasonable amount of time or >> the >> > >>> commit should be reverted. >> > >>> This is mainly for committers, contributors should also watch the >> test >> > >>> runs for their patches. >> > >>> No excuses. The tests are passing now. >> > >>> I do not care whether a test passes locally, or whether it fails >> > rarely, >> > >>> or whether some tests failed previously, or whatever. >> > >>> >> > >>> Please, consider this a condition for me to continue as release >> manager >> > >>> for 0.94. >> > >>> (This is only for the 0.94 tests. I cannot speak for HadoopQA, or the >> > >>> regular trunk test suite, although eventually I assume we want >> similar >> > >>> guidelines there) >> > >>> >> > >>> I increased the retention time for past builds. I will find you :) >> > >>> I will publicly shame you. I will retroactively -1 the change and >> > revert >> > >>> it, and then shame you again. :) >> > >>> >> > >>> Lastly, this is a function of the large amount of contributed >> patches. >> > >>> So it is a good problem to have. >> > >>> HBase it an actively maintained project and we certainly want to keep >> > it >> > >>> this way, just with an acknoledgement that keeping the test suite >> > passing >> > >>> is important. >> > >>> >> > >>> Thanks and Merry Christmas (to whoever celebrates that). >> > >>> >> > >>> -- Lars >> > >> >> > >> >> > >> >> > > >> > >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
