[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116019#comment-13116019 ]
jirapos...@reviews.apache.org commented on HBASE-3446: ------------------------------------------------------ bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > There is a lot of excellence in here. I'm going to look at the code itself with this diff applied to try and understand where/how CT is now being used. I'm a little unclear between the lines you'd like to draw and the lines you actually draw in this diff. bq. > bq. > Great work! Sorry about that. Let me get you better answer to your question. I think its not very clear because I myself was unclear on scope of CT when I started in. What this patch has here is an attempt at shutting down CT scope with subsequent work put off for HBASE-4495. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 513 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line513> bq. > bq. > maybe note here that you should not be synchronized on metaAvailable (and it will do so in the method)... the next method below is nicely clear in this regard Will do bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java, line 575 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45907#file45907line575> bq. > bq. > verify the connection works, and also that the server is actually hosting the region we think it is... the comment makes me think this is looking up which server hosts the passed region but it's just verifying if we can connect to the server we think is hosting the region and verifies whether it's hosting it or not (so this fails if we can't connect or if the region is not on this server) Good point. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java, line 194 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45908#file45908line194> bq. > bq. > i'm still trying to understand exactly what you've changed and what is still a TODO, but this looks much nicer now! :) In the above, we'd get the HRegionInterface and do the invocation on the actual Interface. The alternative steps back and asks an HTable instance to do the work. If an issue with former we'd just let the exception out. In the alternative, we'll do HTable retries before we let the exception out (and the retries are boosted in server-context). bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java, line 2 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45909#file45909line2> bq. > bq. > missing copyright and year? Turns out that copyright is not actually needed https://issues.apache.org/jira/browse/HBASE-3870 bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/client/Result.java, line 568 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45915#file45915line568> bq. > bq. > seems like this should be moved to static methods in a helper class rather than exposing to our client-side Result OK. It was kinda nice being able to do result.getServerNameFromCatalogResult. I suppose it does pollute. I can move it back to MetaReader since that seems like next best place. You are right shouldn't be generally public stuff. Will fix. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java, line 70 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45918#file45918line70> bq. > bq. > this seems like an important public method. i like the rename and your additional comments, but maybe we should add more. default behavior is to use a cached location, if one is not found, it is looked up in a catalog. setting reload to true bypasses the cache and forces the lookup to a catalog. and then, under what cases do we get an exception? does this verify that the server is actually hosting the region? or it just looks up in the catalog (i guess failure there could cause IOE) and if it finds something, just returns a connection to that RS (w/ no verification)... correct? Will look into this. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, lines 1041-1047 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45921#file45921line1041> bq. > bq. > why do you remove the javadoc on this method? Will look into this. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java, line 132 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45931#file45931line132> bq. > bq. > huh? :) Let me fix. bq. On 2011-09-27 23:36:00, Jonathan Gray wrote: bq. > src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java, line 52 bq. > <https://reviews.apache.org/r/2065/diff/1/?file=45932#file45932line52> bq. > bq. > 30,000 ft desc? i guess test name is self descriptive? :) Will do. - Michael ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/#review2111 ----------------------------------------------------------- On 2011-09-27 06:38:09, Michael Stack wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/2065/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-09-27 06:38:09) bq. bq. bq. Review request for hbase and Jonathan Gray. bq. bq. bq. Summary bq. ------- bq. bq. Make the Meta* operations against meta retry. We do it by using HTable instances. bq. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). bq. In 0.89, we had special RetryableMetaOperation class that was a bq. subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries bq. with its retry loop. Now we just use HTable instead (Costs some on setup but bq. otherwise, we avoid duplicating code). Upped the retries on serverside too. bq. bq. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying bq. heavily on CT methods getting proxy connections to meta and root servers. bq. CT needs to be cut back. This patch closes down access on (unused) public bq. methods and removes being able to get an HRegionInterface on meta and root bq. -- this stuff is used internally to CT only now; use MetaEditor or bq. MetaReader if you want to update or read catalog tables. Opening new issue bq. to cutback CT use over the code base. bq. bq. A little off topic but couldn't help it since was in MetaReader and MetaEditor bq. trying to clean them up, I ended up moving meta migration code out to its bq. own class rather than have it in all inside in MetaEditor. bq. bq. Here is some detail to help reviews. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java bq. Clean up. Shutdown access on some of these unused methods. Don't bq. let out HRegionInterface instances in particular since we are going bq. away from raw HRI use to instead use a connection with retries: bq. i.e. HTable. bq. bq. Comments on state of this class. Javadoc edits. bq. getZooKeeperWatcher on HConnection is deprecated so don't use it bq. in constructor. Override MetaNodeTracker and on node delete bq. reset meta location (We used to do this over in MetaNodeTracker bq. but to do that we had to have a CatalogTracker over in zk package bq. which is silly -- bad package encapsulation). bq. bq. (waitForRootServer) Renamed getRootServerConnection and change it bq. from public to package private. bq. (waitForRootServerConnectionDefault, getRootServerConnection) Removed. bq. (getMetaServerConnection) Change from public to package private. bq. Use MetaReader to read the meta location in root rather than a bq. raw HRegionInterface so we get retrying. bq. (remaining, timedout) Added utility methods. bq. (waitForMetaServer) Changed from public to private. bq. (resetMetaLocation) Made it synchronized on metaAvailable. bq. Not all accesses were synchronized. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java bq. Refactor to use HTable instead of raw HRegionInterface so we get bq. retrying. For each operation we get an HTable, use it, then close it. bq. (putToMetaTable, putsToMetaTable, etc) Utility methods. bq. (updateRootWithMetaMigrationStatus, etc.) Moved out to own bq. class since these classes are for a one-time migration only. bq. bq. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java bq. New class that holds all Meta* methods updating meta table used bq. doing the one-time migration done to meta on startup. This class bq. is marked deprecated because its going to be dropped in 0.94. bq. bq. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java bq. Retrofit methods in here to use fullScan methods with Visitor. bq. (getCatalogRegionInterface, getCatalogRegionNameForTable, bq. getCatalogRegionNameForRegion) Removed. bq. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc. bq. (fullScanOfResults) Renamed as fullScan override. bq. (fullScanOfRoot) Added as deprecated. We should be doing bq. this against zk. bq. (metaRowToRegionPair, getServerNameFromResult) Moved to Result bq. (CollectAllVisitor) Added bq. M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java bq. Handle few cases where methods throw InterruptedException bq. (Don't let it out on the HBaseAdmin public API) bq. bq. M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java bq. Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext bq. on failure. Call ServerCallable connect AFTER beforeCall rather than bq. ServerCallable.instantiateServer BEFORE beforeCall. bq. bq. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java bq. Add to DEBUG message the connection name we were using. bq. bq. M src/main/java/org/apache/hadoop/hbase/client/Result.java bq. (getServerNameFromCatalogResult, parseCatalogResult, bq. parseHRegionInfoFromCatalogResult) Added bq. bq. M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java bq. Added new ThrowableWithExtraContext that takes extra context info. bq. bq. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java bq. instantiateServer renamed as connect bq. bq. M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java bq. Javadoc. Renamed instantiateServer as connect. bq. bq. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java bq. Javadoc. Use MetaReader method instead of handcoding. bq. bq. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java bq. Handle InterruptedException bq. bq. M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java bq. Handle InterruptedException bq. bq. M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java bq. Allow hris can come back null when we ask for table regions. bq. bq. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java bq. Remove import of CatalogTracker. bq. bq. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java bq. Use utility in MetaReader instead of handcode it. bq. bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java bq. Use new HConnectionTestingUtility mocking tests (need to use it bq. because its a bit harder mocking tests now that we use HTable instead bq. of the more direct HRegionInterface). bq. Add some tests of broken out utility methods. bq. bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java bq. Add tests bq. bq. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java bq. Add test of 3669 retrying. bq. bq. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java bq. New test utility that helps with mock of HConnection making it so can mock bq. an HConnection and then have an HTable use the mocked connection. Can do bq. a mock or a spied on HConnection bq. bq. M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java bq. The migration code moved. Reference new location. bq. bq. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java bq. M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java bq. M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java bq. Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix. bq. bq. bq. This addresses bug hbase-3446. bq. https://issues.apache.org/jira/browse/hbase-3446 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 bq. src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 bq. src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION bq. src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c bq. src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 bq. src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf bq. src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d bq. src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 bq. src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe bq. src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 bq. src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 bq. src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 bq. src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 bq. src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 bq. src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 bq. src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 bq. src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 bq. src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 bq. src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 bq. src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 bq. src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION bq. src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e bq. src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 bq. src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc bq. src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 bq. src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 bq. bq. Diff: https://reviews.apache.org/r/2065/diff bq. bq. bq. Testing bq. ------- bq. bq. All tests passed recently. Rerunning again. bq. bq. bq. Thanks, bq. bq. Michael bq. bq. > ProcessServerShutdown fails if META moves, orphaning lots of regions > -------------------------------------------------------------------- > > Key: HBASE-3446 > URL: https://issues.apache.org/jira/browse/HBASE-3446 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.0 > Reporter: Todd Lipcon > Assignee: stack > Priority: Blocker > Fix For: 0.92.0 > > Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, > 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, > 3446v15.txt > > > I ran a rolling restart on a 5 node cluster with lots of regions, and > afterwards had LOTS of regions left orphaned. The issue appears to be that > ProcessServerShutdown failed because the server hosting META was restarted > around the same time as another server was being processed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira