I've been trying (essentially the entire day) getting a successful jenkins 
build for 0.94 (triggering the test run periodically from my phone). Not a 
*single* run succeeded.
This is clearly not acceptable. Something is off.

The tests that fails the most frequently are:
- 
TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndSHouldSuccessfullyExecuteRollback
- TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittingState
(The failure cause most of the time is too many files open, but also fail 
because of unavailable regions).

Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and HBASE-6853.

Either there is something wrong with the tests, or we introduced some problems 
in the code base.

Note that I am not dinging these two changes specifically. Both were fixes with 
a lot of thought and care behind them.

There are also various time out issues in other tests.

These were all the fixes added since the last RC:
[HBASE-4565] - Maven HBase build broken on cygwin with copynativelib.sh call
[HBASE-6299] - RS starting region open while failing ack to 
HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a 
series of successive problems
[HBASE-6679] - RegionServer aborts due to race between compaction and split
[HBASE-6688] - folder referred by thrift demo app instructions is outdated
[HBASE-6854] - Deletion of SPLITTING node on split rollback should clear the 
region from RIT
[HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to incorrect split 
into intermediate index blocks
[HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the environment
[HBASE-6889] - Ignore source control files with apache-rat
[HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or compaction 
happens before the reseek.
[HBASE-6901] - Store file compactSelection throws ArrayIndexOutOfBoundsException
[HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to 
TableNotEnabledException
[HBASE-6912] - Filters are not properly applied in certain cases
[HBASE-6916] - HBA logs at info level errors that won't show in the shell
[HBASE-6920] - On timeout connecting to master, client can get stuck and never 
make progress
[HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different fs for 
hbase.root and fs.defaultFS
[HBASE-6946] - JavaDoc missing from release tarballs
[HBASE-5582] - "No HServerInfo found for" should be a WARNING message
[HBASE-6914] - Scans/Gets/Mutations don't give a good error if the table is 
disabled.
[HBASE-6853] - IllegalArgument Exception is thrown when an empty region is 
spliitted.

Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and HBASE-6853 
(and maybe HBASE-6299)

I could also roll all of these back except HBASE-6920 (which is the one that 
sunk the last RC). And leave the rest of the next RC.

Also, from now on - at least until 0.94.2 is released, please clear all 0.94 
changes with me before you commit. There is clearly too much churn going into 
0.94 too quickly, which prevents 0.94.2 from stabilizing.

-- Lars

Reply via email to