[
https://issues.apache.org/jira/browse/HBASE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060870#comment-13060870
]
[email protected] commented on HBASE-3867:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1019/
-----------------------------------------------------------
(Updated 2011-07-06 22:08:37.499986)
Review request for hbase.
Summary
-------
When cluster is stopped and removing server from cluster which contained meta
region, then restart cluster,
getCachedConnection() throws "NoRouteToHostException"
NoRouteToHostException is caught, similarly to how SocketTimeoutException is
handled.
If there is uncaught IOException still, we ask Master for list of servers and
obtains region connection from one of them.
This addresses bug HBASE-3867.
https://issues.apache.org/jira/browse/HBASE-3867
Diffs (updated)
-----
/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 1143525
Diff: https://reviews.apache.org/r/1019/diff
Testing
-------
Ran test suite.
Thanks,
Ted
> when cluster is stopped and server which hosted meta region is removed from
> cluster, master breaks down after restarting cluster.
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-3867
> URL: https://issues.apache.org/jira/browse/HBASE-3867
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.1, 0.90.2
> Reporter: Liu Jia
> Priority: Critical
> Fix For: 0.90.2
>
> Attachments: 3867-trunk-v2.txt, 3867-trunk-v3.txt,
> CatalogTracker.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> When cluster stopped and romove server from cluster which contains meta
> region, then restart cluster,
> From the following code throws "NoRouteToHostException"
> package org.apache.hadoop.hbase.catalog;
> public class CatalogTracker
> private HRegionInterface getMetaServerConnection(boolean refresh)
> throws IOException, InterruptedException {
> synchronized (metaAvailable) {
> if (metaAvailable.get()) {
> HRegionInterface current = getCachedConnection(metaLocation);
> if (!refresh) {
> return current;
> }
> if (verifyRegionLocation(current, this.metaLocation, META_REGION)) {
> return current;
> }
> resetMetaLocation();
> }
> HRegionInterface rootConnection = getRootServerConnection();
> if (rootConnection == null) {
> return null;
> }
> HServerAddress newLocation =
> MetaReader.readMetaLocation(rootConnection);
> if (newLocation == null) {
> return null;
> }
> ////////the following line throws the exception
> HRegionInterface newConnection = getCachedConnection(newLocation);
> if (verifyRegionLocation(newConnection, this.metaLocation,
> META_REGION)) {
> setMetaLocation(newLocation);
> return newConnection;
> }
> return null;
> }
> }
> /////////////the following method don't handle the exception.
> public class CatalogTracker
> public boolean verifyMetaRegionLocation(final long timeout)
> throws InterruptedException, IOException {
> return getMetaServerConnection(true) != null;
> }
> //////////////////master call the CatalogTracker's method and don't handle
> the problem too.
> package org.apache.hadoop.hbase.master;
> public class HMaster
> int assignRootAndMeta()
> throws InterruptedException, IOException, KeeperException {
> int assigned = 0;
> long timeout = this.conf.getLong("hbase.catalog.verification.timeout",
> 1000);
> // Work on ROOT region. Is it in zk in transition?
> boolean rit = this.assignmentManager.
>
> processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
> if (!catalogTracker.verifyRootRegionLocation(timeout)) {
> this.assignmentManager.assignRoot();
> this.catalogTracker.waitForRoot();
> assigned++;
> }
> LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit +
> ", location=" + catalogTracker.getRootLocation());
> // Work on meta region
> rit = this.assignmentManager.
>
> processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
> ///////////////////////////////
> when restart cluster master break down here.
> ////////////////////////////////
> if (!this.catalogTracker.verifyMetaRegionLocation(timeout)) {
> this.assignmentManager.assignMeta();
> this.catalogTracker.waitForMeta();
> // Above check waits for general meta availability but this does not
> // guarantee that the transition has completed
>
> this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
> assigned++;
> }
> LOG.info(".META. assigned=" + assigned + ", rit=" + rit +
> ", location=" + catalogTracker.getMetaLocation());
> return assigned;
> }
> Thanks to JunQiang Yuan in www.alipay.com for providing information about
> this bug.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira