[ 
https://issues.apache.org/jira/browse/HBASE-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2414:
-------------------------

    Attachment: testmaster-v16.txt

I want to commit this patch.  Can I get a review.  Its kinda polluted in that 
it contains:

1. Means of testing state transitions across the master.  Tests can register 
listeners.  In the listener you should be able to delay, cancel, count, etc., 
RegionServerOperations.  It should be possible in the listener simulating 
uglyness seen out on live clusters.
2. A fix for the HBASE-2428 bug.
3. Facility added to TestHBaseUtility and MiniHBaseCluster
4. I started refactoring of the queue of RegionServerOperations in master 
moving it out to a separate file making it testable but then I ran into fact 
that RegionServerOperations each have a reference to master AND they can put 
themselves back on the queue -- the circularity baffles.  This has to be fixed 
but will do in a separate patch.

Here is a commit message with some detail on the commit:

{code}
M src/test/org/apache/hadoop/hbase/HBaseTestingUtility.java
  Broke up startMiniHBaseCluster into smaller methods so can mix and
  match pieces of minihbasecluster toward other ends.
  (setupClusterBuildDir, isRunningCluster, getMiniHBaseCluster): Added.
M src/test/org/apache/hadoop/hbase/TestInfoServers.java
M src/test/org/apache/hadoop/hbase/TestRegionRebalancing.java
M src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java
M src/test/org/apache/hadoop/hbase/regionserver/TestLogRolling.java
M 
src/test/org/apache/hadoop/hbase/regionserver/DisabledTestRegionServerExit.java
M src/test/org/apache/hadoop/hbase/mapreduce/TestTableIndex.java
M src/test/org/apache/hadoop/hbase/mapred/TestTableIndex.java
  Ripple from change of MiniHBaseCluster.getRegionThreads to
  getRegionServerThreads.
M src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java
  Added new MiniHBaseClusterMaster that is override of HMaster so
  I can piggyback messages for designated regionservers atop the
  heartbeat: close region, etc.
  (getServerWithMeta, addMessageToSendRegionServer): Added.
A src/test/org/apache/hadoop/hbase/master/TestRegionServerOperationQueue.java
  Stubbed out test of new RegionServerOperationQueue class.
A src/test/org/apache/hadoop/hbase/master/TestMasterTransistions.java
  Test master cluster transistions.  Includes unit test of hbase-2428.
M src/test/org/apache/hadoop/hbase/util/TestMigration.java
  Disable migration test.  Nothing to migrate yet and besides it was
  trying to load a 0.19 hbase data tar.gz that has since been removed.
M src/contrib/stargate/build.xml
  Added a copyright.
M src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
  Documentation and moved some methods from down at tail of the class
  where they were in among static methods used parsing cmd-line usage
  up above the usage and master startup static methods.
  Added fix for issue broken by an hbase-1215 commit where we were
  looking at wrong address (Grep for r796326 for more).
M src/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
  Moved bulk out to new JVMClusterUtils class and made accessible.
  Added passing of HMaster.class to instantiate to facilitate
  passing of TestHMaster.class.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java
  The RegionServerOperations queues moved out to their own class from
  Master.  Allows listeners to register and get notice before and after
  a RegionServerOperation is processed.  Includes part of bug fix for
  hbase-2428. When an error processing a RegionServerOperation, we'd fall
  into the IOException catch.  We'd then put the operation back on the delay
  queue for later processing only we'd not reset its expiration.  It
  would therefore run again immmediately... fail again, and so on.
  Changed the return from process to be an enum rather than true/false
  so I don't have to have do things like call checkfs down in here and
  I don't need to have a master instance around.
M src/java/org/apache/hadoop/hbase/master/ServerManager.java
  How we add RegionServerOperation instances has changed to go via
  RegionServerOperationQueue now.
M src/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java
  (getDeadServerAddress): Added.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperationListener.java
  Listener interface to implement if interested in watching 
RegionServerOperations.
M src/java/org/apache/hadoop/hbase/master/HMaster.java
  Moved the RegionServerOperation code out of here to
  RegionServerOperationQueue.
  (adornRegionServerAnswer, constructMaster): Added.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java
  Comment.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionClose.java
  Added in *fix* for 2428 NPE.  For now did what happens in
  ProcessRegionOpen for symmetry's sake but it needs to be replaced.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperation.java
  (resetExpiration): Added.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionStatusChange.java
  Javaadoc.
M src/java/org/apache/hadoop/hbase/util/Threads.java
  (threadDumpingIsAlive): Added from LocalHBaseCluster.
  (sleep): Added.
M src/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
  New class that has facility moved from LocalHBaseCluster with added
  javadoc and made accessible.  Needed testing.
{code}

> Enhance test suite to be able to specify distributed scenarios
> --------------------------------------------------------------
>
>                 Key: HBASE-2414
>                 URL: https://issues.apache.org/jira/browse/HBASE-2414
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.20.3
>            Reporter: Karthik Ranganathan
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: directcluster.txt, master2.txt, testmaster-v11.patch, 
> testmaster-v14.txt, testmaster-v16.txt, testmaster-v4.patch, 
> testmaster-v5.patch, testmaster-v7.patch, testmaster-v8.patch
>
>
> We keep finding good cases that are reasonably hard to test, yet the test 
> suite does not encode these. 
> For example: 
> HBASE-2413 Master does not respect generation stamps, may result in meta 
> getting permanently offlined
> HBASE-2312 Possible data loss when RS goes into GC pause while rolling HLog
> I am sure there are many more such "scenarios" we should put into the unit 
> tests. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to