[
https://issues.apache.org/jira/browse/HBASE-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-2414:
-------------------------
Attachment: testmaster-v16.txt
I want to commit this patch. Can I get a review. Its kinda polluted in that
it contains:
1. Means of testing state transitions across the master. Tests can register
listeners. In the listener you should be able to delay, cancel, count, etc.,
RegionServerOperations. It should be possible in the listener simulating
uglyness seen out on live clusters.
2. A fix for the HBASE-2428 bug.
3. Facility added to TestHBaseUtility and MiniHBaseCluster
4. I started refactoring of the queue of RegionServerOperations in master
moving it out to a separate file making it testable but then I ran into fact
that RegionServerOperations each have a reference to master AND they can put
themselves back on the queue -- the circularity baffles. This has to be fixed
but will do in a separate patch.
Here is a commit message with some detail on the commit:
{code}
M src/test/org/apache/hadoop/hbase/HBaseTestingUtility.java
Broke up startMiniHBaseCluster into smaller methods so can mix and
match pieces of minihbasecluster toward other ends.
(setupClusterBuildDir, isRunningCluster, getMiniHBaseCluster): Added.
M src/test/org/apache/hadoop/hbase/TestInfoServers.java
M src/test/org/apache/hadoop/hbase/TestRegionRebalancing.java
M src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java
M src/test/org/apache/hadoop/hbase/regionserver/TestLogRolling.java
M
src/test/org/apache/hadoop/hbase/regionserver/DisabledTestRegionServerExit.java
M src/test/org/apache/hadoop/hbase/mapreduce/TestTableIndex.java
M src/test/org/apache/hadoop/hbase/mapred/TestTableIndex.java
Ripple from change of MiniHBaseCluster.getRegionThreads to
getRegionServerThreads.
M src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java
Added new MiniHBaseClusterMaster that is override of HMaster so
I can piggyback messages for designated regionservers atop the
heartbeat: close region, etc.
(getServerWithMeta, addMessageToSendRegionServer): Added.
A src/test/org/apache/hadoop/hbase/master/TestRegionServerOperationQueue.java
Stubbed out test of new RegionServerOperationQueue class.
A src/test/org/apache/hadoop/hbase/master/TestMasterTransistions.java
Test master cluster transistions. Includes unit test of hbase-2428.
M src/test/org/apache/hadoop/hbase/util/TestMigration.java
Disable migration test. Nothing to migrate yet and besides it was
trying to load a 0.19 hbase data tar.gz that has since been removed.
M src/contrib/stargate/build.xml
Added a copyright.
M src/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
Documentation and moved some methods from down at tail of the class
where they were in among static methods used parsing cmd-line usage
up above the usage and master startup static methods.
Added fix for issue broken by an hbase-1215 commit where we were
looking at wrong address (Grep for r796326 for more).
M src/java/org/apache/hadoop/hbase/LocalHBaseCluster.java
Moved bulk out to new JVMClusterUtils class and made accessible.
Added passing of HMaster.class to instantiate to facilitate
passing of TestHMaster.class.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperationQueue.java
The RegionServerOperations queues moved out to their own class from
Master. Allows listeners to register and get notice before and after
a RegionServerOperation is processed. Includes part of bug fix for
hbase-2428. When an error processing a RegionServerOperation, we'd fall
into the IOException catch. We'd then put the operation back on the delay
queue for later processing only we'd not reset its expiration. It
would therefore run again immmediately... fail again, and so on.
Changed the return from process to be an enum rather than true/false
so I don't have to have do things like call checkfs down in here and
I don't need to have a master instance around.
M src/java/org/apache/hadoop/hbase/master/ServerManager.java
How we add RegionServerOperation instances has changed to go via
RegionServerOperationQueue now.
M src/java/org/apache/hadoop/hbase/master/ProcessServerShutdown.java
(getDeadServerAddress): Added.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperationListener.java
Listener interface to implement if interested in watching
RegionServerOperations.
M src/java/org/apache/hadoop/hbase/master/HMaster.java
Moved the RegionServerOperation code out of here to
RegionServerOperationQueue.
(adornRegionServerAnswer, constructMaster): Added.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionOpen.java
Comment.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionClose.java
Added in *fix* for 2428 NPE. For now did what happens in
ProcessRegionOpen for symmetry's sake but it needs to be replaced.
M src/java/org/apache/hadoop/hbase/master/RegionServerOperation.java
(resetExpiration): Added.
M src/java/org/apache/hadoop/hbase/master/ProcessRegionStatusChange.java
Javaadoc.
M src/java/org/apache/hadoop/hbase/util/Threads.java
(threadDumpingIsAlive): Added from LocalHBaseCluster.
(sleep): Added.
M src/java/org/apache/hadoop/hbase/util/JVMClusterUtil.java
New class that has facility moved from LocalHBaseCluster with added
javadoc and made accessible. Needed testing.
{code}
> Enhance test suite to be able to specify distributed scenarios
> --------------------------------------------------------------
>
> Key: HBASE-2414
> URL: https://issues.apache.org/jira/browse/HBASE-2414
> Project: Hadoop HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 0.20.3
> Reporter: Karthik Ranganathan
> Assignee: stack
> Priority: Blocker
> Fix For: 0.20.5, 0.21.0
>
> Attachments: directcluster.txt, master2.txt, testmaster-v11.patch,
> testmaster-v14.txt, testmaster-v16.txt, testmaster-v4.patch,
> testmaster-v5.patch, testmaster-v7.patch, testmaster-v8.patch
>
>
> We keep finding good cases that are reasonably hard to test, yet the test
> suite does not encode these.
> For example:
> HBASE-2413 Master does not respect generation stamps, may result in meta
> getting permanently offlined
> HBASE-2312 Possible data loss when RS goes into GC pause while rolling HLog
> I am sure there are many more such "scenarios" we should put into the unit
> tests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.