[ https://issues.apache.org/jira/browse/HBASE-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115261#comment-13115261 ]
jirapos...@reviews.apache.org commented on HBASE-3446: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2065/ ----------------------------------------------------------- Review request for hbase and Jonathan Gray. Summary ------- Make the Meta* operations against meta retry. We do it by using HTable instances. (HTable calls HConnection.getRegionServerWithRetries for get, put, scan etc). In 0.89, we had special RetryableMetaOperation class that was a subclass of Callable which reproduced the guts of HConnection.getRegionServerWithRetries with its retry loop. Now we just use HTable instead (Costs some on setup but otherwise, we avoid duplicating code). Upped the retries on serverside too. Had problem with CatalogJanitor. MetaReader and MetaEditor were relying heavily on CT methods getting proxy connections to meta and root servers. CT needs to be cut back. This patch closes down access on (unused) public methods and removes being able to get an HRegionInterface on meta and root -- this stuff is used internally to CT only now; use MetaEditor or MetaReader if you want to update or read catalog tables. Opening new issue to cutback CT use over the code base. A little off topic but couldn't help it since was in MetaReader and MetaEditor trying to clean them up, I ended up moving meta migration code out to its own class rather than have it in all inside in MetaEditor. Here is some detail to help reviews. M src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java Clean up. Shutdown access on some of these unused methods. Don't let out HRegionInterface instances in particular since we are going away from raw HRI use to instead use a connection with retries: i.e. HTable. Comments on state of this class. Javadoc edits. getZooKeeperWatcher on HConnection is deprecated so don't use it in constructor. Override MetaNodeTracker and on node delete reset meta location (We used to do this over in MetaNodeTracker but to do that we had to have a CatalogTracker over in zk package which is silly -- bad package encapsulation). (waitForRootServer) Renamed getRootServerConnection and change it from public to package private. (waitForRootServerConnectionDefault, getRootServerConnection) Removed. (getMetaServerConnection) Change from public to package private. Use MetaReader to read the meta location in root rather than a raw HRegionInterface so we get retrying. (remaining, timedout) Added utility methods. (waitForMetaServer) Changed from public to private. (resetMetaLocation) Made it synchronized on metaAvailable. Not all accesses were synchronized. M src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java Refactor to use HTable instead of raw HRegionInterface so we get retrying. For each operation we get an HTable, use it, then close it. (putToMetaTable, putsToMetaTable, etc) Utility methods. (updateRootWithMetaMigrationStatus, etc.) Moved out to own class since these classes are for a one-time migration only. A src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java New class that holds all Meta* methods updating meta table used doing the one-time migration done to meta on startup. This class is marked deprecated because its going to be dropped in 0.94. M src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java Retrofit methods in here to use fullScan methods with Visitor. (getCatalogRegionInterface, getCatalogRegionNameForTable, getCatalogRegionNameForRegion) Removed. (fullScan) Cleaned up the fullScans. Fixed up wrong javadoc. (fullScanOfResults) Renamed as fullScan override. (fullScanOfRoot) Added as deprecated. We should be doing this against zk. (metaRowToRegionPair, getServerNameFromResult) Moved to Result (CollectAllVisitor) Added M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java Handle few cases where methods throw InterruptedException (Don't let it out on the HBaseAdmin public API) M src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java Populate new exception, RetriesExhaustedException.ThrowableWithExtraContext on failure. Call ServerCallable connect AFTER beforeCall rather than ServerCallable.instantiateServer BEFORE beforeCall. M src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java Add to DEBUG message the connection name we were using. M src/main/java/org/apache/hadoop/hbase/client/Result.java (getServerNameFromCatalogResult, parseCatalogResult, parseHRegionInfoFromCatalogResult) Added M src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java Added new ThrowableWithExtraContext that takes extra context info. M src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java instantiateServer renamed as connect M src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java Javadoc. Renamed instantiateServer as connect. M src/main/java/org/apache/hadoop/hbase/master/HMaster.java Javadoc. Use MetaReader method instead of handcoding. M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java Handle InterruptedException M src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java Handle InterruptedException M src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java Allow hris can come back null when we ask for table regions. M src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java Remove import of CatalogTracker. M src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java Use utility in MetaReader instead of handcode it. M src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java Use new HConnectionTestingUtility mocking tests (need to use it because its a bit harder mocking tests now that we use HTable instead of the more direct HRegionInterface). Add some tests of broken out utility methods. M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java Add tests M src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java Add test of 3669 retrying. M src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java New test utility that helps with mock of HConnection making it so can mock an HConnection and then have an HTable use the mocked connection. Can do a mock or a spied on HConnection M src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java The migration code moved. Reference new location. M src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java M src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java M src/test/java/org/apache/hadoop/hbase/master/TestMaster.java Was waiting on wrong events. Was waiting on Opens rather than Splits. Fix. This addresses bug hbase-3446. https://issues.apache.org/jira/browse/hbase-3446 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java 5bc3bb0 src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java ac0bc38 src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java ac60311 src/main/java/org/apache/hadoop/hbase/catalog/MetaMigrationRemovingHTD.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/catalog/MetaReader.java 2afe70c src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java a55a4b1 src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 5afaedf src/main/java/org/apache/hadoop/hbase/client/HTable.java b5cf639 src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java da5b80d src/main/java/org/apache/hadoop/hbase/client/Result.java 8a0c1a9 src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedException.java 89d2abe src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java 5ea38b4 src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java 816f8b7 src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java a069400 src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java c53d3be src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java 1e5d83c src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 6ac6408 src/main/java/org/apache/hadoop/hbase/master/handler/TableEventHandler.java c374d6f src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 5869c18 src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java e72cfa2 src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java 55257b3 src/test/java/org/apache/hadoop/hbase/TestRegionRebalancing.java 9023af8 src/test/java/org/apache/hadoop/hbase/catalog/TestCatalogTracker.java 538e809 src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditor.java 84130e2 src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java PRE-CREATION src/test/java/org/apache/hadoop/hbase/client/TestHCM.java f09944e src/test/java/org/apache/hadoop/hbase/client/TestMetaMigration.java 645bca6 src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java 8fcdccc src/test/java/org/apache/hadoop/hbase/master/TestDistributedLogSplitting.java a0e6450 src/test/java/org/apache/hadoop/hbase/master/TestMaster.java f473c80 Diff: https://reviews.apache.org/r/2065/diff Testing ------- All tests passed recently. Rerunning again. Thanks, Michael > ProcessServerShutdown fails if META moves, orphaning lots of regions > -------------------------------------------------------------------- > > Key: HBASE-3446 > URL: https://issues.apache.org/jira/browse/HBASE-3446 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.0 > Reporter: Todd Lipcon > Assignee: stack > Priority: Blocker > Fix For: 0.92.0 > > Attachments: 3446-v11.txt, 3446-v12.txt, 3446-v13.txt, 3446-v14.txt, > 3446-v2.txt, 3446-v3.txt, 3446-v4.txt, 3446-v7.txt, 3446-v9.txt, 3446.txt, > 3446v15.txt > > > I ran a rolling restart on a 5 node cluster with lots of regions, and > afterwards had LOTS of regions left orphaned. The issue appears to be that > ProcessServerShutdown failed because the server hosting META was restarted > around the same time as another server was being processed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira