RE: State of the 0.94 tests

Ramkrishna.S.Vasudevan Sun, 07 Oct 2012 21:03:08 -0700

Hi Lars

I was not in town and was in travel for the last 2 days.  I will immediately
check the reason for the testcase failures.  Had I been there I would have
helped out earlier.


Sorry about that.

Regards
Ram

> -----Original Message-----
> From: lars hofhansl [mailto:[email protected]]
> Sent: Monday, October 08, 2012 2:07 AM
> To: [email protected]
> Subject: Re: State of the 0.94 tests
> 
> After this change things look better. Apologies for the noise. Stay
> tuned for the next RC.
> 
> -- Lars
> 
> 
> 
> ________________________________
>  From: lars hofhansl <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Sunday, October 7, 2012 11:23 AM
> Subject: Re: State of the 0.94 tests
> 
> I looked back through the failures. I had recently enabled all "ubuntu"
> build vms for the 0.94 builds.
> It turns out that most of the environment issues occur on ubuntu2. I
> excluded that from the build vms.
> 
> 
> -- Lars
> 
> 
> 
> ________________________________
> From: Andrew Purtell <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Sunday, October 7, 2012 1:36 AM
> Subject: Re: State of the 0.94 tests
> 
> Too many open files usually is an environment issue.
> 
> Lars, you should consider setting up a private Jenkins as a sanity
> check.
> 
> On Oct 7, 2012, at 2:41 PM, lars hofhansl <[email protected]> wrote:
> 
> > Looks like after all that whining I finally got a successful build.
> > But I lost confidence in the current 0.94 code line.
> >
> > Still, it is possible that all of these were environmental issue. If
> we can get a few more successful runs, it could be OK.
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> > From: lars hofhansl <[email protected]>
> > To: hbase-dev <[email protected]>
> > Sent: Saturday, October 6, 2012 11:11 PM
> > Subject: State of the 0.94 tests
> >
> > I've been trying (essentially the entire day) getting a successful
> jenkins build for 0.94 (triggering the test run periodically from my
> phone). Not a *single* run succeeded.
> > This is clearly not acceptable. Something is off.
> >
> > The tests that fails the most frequently are:
> > -
> TestSplitTransactionOnCluster.testShouldThrowIOExceptionIfStoreFileSize
> IsEmptyAndSHouldSuccessfullyExecuteRollback
> > -
> TestSplitTransactionOnCluster.testShouldClearRITWhenNodeFoundInSplittin
> gState
> > (The failure cause most of the time is too many files open, but also
> fail because of unavailable regions).
> >
> > Both tests were added recently (since 0.94.2RC2). See HBASE-6854 and
> HBASE-6853.
> >
> > Either there is something wrong with the tests, or we introduced some
> problems in the code base.
> >
> > Note that I am not dinging these two changes specifically. Both were
> fixes with a lot of thought and care behind them.
> >
> > There are also various time out issues in other tests.
> >
> > These were all the fixes added since the last RC:
> > [HBASE-4565] - Maven HBase build broken on cygwin with
> copynativelib.sh call
> > [HBASE-6299] - RS starting region open while failing ack to
> HMaster.sendRegionOpen() causes inconsistency in HMaster's region state
> and a series of successive problems
> > [HBASE-6679] - RegionServer aborts due to race between compaction and
> split
> > [HBASE-6688] - folder referred by thrift demo app instructions is
> outdated
> > [HBASE-6854] - Deletion of SPLITTING node on split rollback should
> clear the region from RIT
> > [HBASE-6871] - HFileBlockIndex Write Error in HFile V2 due to
> incorrect split into intermediate index blocks
> > [HBASE-6888] - HBase scripts ignore any HBASE_OPTS set in the
> environment
> > [HBASE-6889] - Ignore source control files with apache-rat
> > [HBASE-6900] - RegionScanner.reseek() creates NPE when a flush or
> compaction happens before the reseek.
> > [HBASE-6901] - Store file compactSelection throws
> ArrayIndexOutOfBoundsException
> > [HBASE-6906] - TestHBaseFsck#testQuarantine* tests are flakey due to
> TableNotEnabledException
> > [HBASE-6912] - Filters are not properly applied in certain cases
> > [HBASE-6916] - HBA logs at info level errors that won't show in the
> shell
> > [HBASE-6920] - On timeout connecting to master, client can get stuck
> and never make progress
> > [HBASE-6927] - WrongFS using HRegionInfo.getTableDesc() and different
> fs for hbase.root and fs.defaultFS
> > [HBASE-6946] - JavaDoc missing from release tarballs
> > [HBASE-5582] - "No HServerInfo found for" should be a WARNING message
> > [HBASE-6914] - Scans/Gets/Mutations don't give a good error if the
> table is disabled.
> > [HBASE-6853] - IllegalArgument Exception is thrown when an empty
> region is spliitted.
> >
> > Unless somebody (Ram :) ) speaks up I will roll back HBASE-6854 and
> HBASE-6853 (and maybe HBASE-6299)
> >
> > I could also roll all of these back except HBASE-6920 (which is the
> one that sunk the last RC). And leave the rest of the next RC.
> >
> > Also, from now on - at least until 0.94.2 is released, please clear
> all 0.94 changes with me before you commit. There is clearly too much
> churn going into 0.94 too quickly, which prevents 0.94.2 from
> stabilizing.
> >
> > -- Lars

RE: State of the 0.94 tests

Reply via email to