----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/1143/#review1755 -----------------------------------------------------------
Ship it! Commit after reviewing the below. This patch makes a HUGE difference. My trashing rolling restart of a 10 node cluster of 3k regions now works (mostly). Seems like an errant region the odd time I test but thats for hbck to figure now.... Great work JGray. trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java <http://review.cloudera.org/r/1143/#comment5678> How did this ever work? Why didn't typing catch this? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5679> We doing this or not? Seems like we are. Should we change this back to plain HashMap... else when concurrent map it gives impression that its ok to meddle in regionPlan w/o first synchronizing? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5680> want to doc this new param? (Not important -- its self-describing) trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5681> We don't have to do the remove anymore? Passing force new plan does this for us? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5682> Should this test go into a method of its own? Is it repeated elsewhere? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5683> What happens if state.isClosing now? (You dropped it from else if here) trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5684> Are all this.regions synchronized? Same for this.regionPlans? You might want to give it all a review before commit? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5685> I think spell it out rather than NSRE in the message. trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5686> Is this ok? Calling assign inside a synchronize on RIT? Is that done everywhere? Or, this is the assign that takes a RegionState.. and expects the synchronize to be in place? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5687> This is good... moving this stuff inside assign. trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5688> Nothing to do here? Job for hbck? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5689> What happens here? I suppose we used shutdown server? trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <http://review.cloudera.org/r/1143/#comment5690> Ok trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java <http://review.cloudera.org/r/1143/#comment5691> Thats an awkward sentence for a debug message... trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java <http://review.cloudera.org/r/1143/#comment5692> We were not using this? Probably for the best. Probably stale by time we went to act on it? - stack On 2010-11-01 11:47:35, Jonathan Gray wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://review.cloudera.org/r/1143/ > ----------------------------------------------------------- > > (Updated 2010-11-01 11:47:35) > > > Review request for hbase and stack. > > > Summary > ------- > > Does cleanup of RIT timeouts according to document in progress. Still > finishing document but I'd like to get this patch tested before finalizing it. > > Also found some strange stuff in server shutdown handling that could have > easily led to some double assignment issues that stack was seeing. > > > This addresses bug HBASE-3181. > http://issues.apache.org/jira/browse/HBASE-3181 > > > Diffs > ----- > > trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 1029789 > trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java > 1029789 > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1029789 > trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java > 1029789 > > trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java > 1029789 > > trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java > 1029789 > > trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java > 1029789 > > trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java > 1029789 > trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1029789 > trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java > 1029789 > trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java > 1029789 > > Diff: http://review.cloudera.org/r/1143/diff > > > Testing > ------- > > Working on tests now. This definitely changes some behavior that is tested > in the new TestMasterFailover so need to figure if the test should change or > whether we need to handle things like CLOSING. Maybe let it timeout a few > times? > > > Thanks, > > Jonathan > >
