[ 
https://issues.apache.org/jira/browse/HBASE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060861#comment-13060861
 ] 

[email protected] commented on HBASE-3867:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1019/#review978
-----------------------------------------------------------



/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/1019/#comment2045>

    i don't understand this.  if we get an IOE connecting to the server hosting 
ROOT we get a connection to a random server and then check if it hosts META?



/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
<https://reviews.apache.org/r/1019/#comment2044>

    did you intentionally leave this commented out here?


- Jonathan


On 2011-07-06 20:53:53, Ted Yu wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1019/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-07-06 20:53:53)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  When cluster is stopped and removing server from cluster which contained 
meta region, then restart cluster,
bq.  getCachedConnection() throws "NoRouteToHostException"
bq.  
bq.  NoRouteToHostException is caught, similarly to how SocketTimeoutException 
is handled.
bq.  
bq.  If there is uncaught IOException still, we ask Master for list of servers 
and obtains region connection from one of them.
bq.  
bq.  
bq.  This addresses bug HBASE-3867.
bq.      https://issues.apache.org/jira/browse/HBASE-3867
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    /src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java 
1142139 
bq.  
bq.  Diff: https://reviews.apache.org/r/1019/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran test suite.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



> when cluster is stopped and server which hosted meta region is removed from 
> cluster, master breaks down after restarting cluster.
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3867
>                 URL: https://issues.apache.org/jira/browse/HBASE-3867
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1, 0.90.2
>            Reporter: Liu Jia
>            Priority: Critical
>             Fix For: 0.90.2
>
>         Attachments: 3867-trunk-v2.txt, 3867-trunk-v3.txt, 
> CatalogTracker.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When cluster stopped and romove server from cluster which contains meta 
> region, then restart cluster,
> From the following code throws "NoRouteToHostException"
> package org.apache.hadoop.hbase.catalog;
> public class CatalogTracker 
>  private HRegionInterface getMetaServerConnection(boolean refresh)
>   throws IOException, InterruptedException {
>     synchronized (metaAvailable) {
>       if (metaAvailable.get()) {
>         HRegionInterface current = getCachedConnection(metaLocation);
>         if (!refresh) {
>           return current;
>         }
>         if (verifyRegionLocation(current, this.metaLocation, META_REGION)) {
>           return current;
>         }
>         resetMetaLocation();
>       }
>       HRegionInterface rootConnection = getRootServerConnection();
>       if (rootConnection == null) {
>         return null;
>       }
>       HServerAddress newLocation = 
> MetaReader.readMetaLocation(rootConnection);
>       if (newLocation == null) {
>         return null;
>       }
>       ////////the following line throws the exception
> HRegionInterface newConnection = getCachedConnection(newLocation);
>       if (verifyRegionLocation(newConnection, this.metaLocation, 
> META_REGION)) {
>         setMetaLocation(newLocation);
>         return newConnection;
>       }
>       return null;
>     }
>   }
> /////////////the following method don't handle the exception.
> public class CatalogTracker 
>   public boolean verifyMetaRegionLocation(final long timeout)
>   throws InterruptedException, IOException {
>     return getMetaServerConnection(true) != null;
>   }
> //////////////////master call the CatalogTracker's method and don't handle 
> the problem too.
> package org.apache.hadoop.hbase.master;
> public class HMaster
> int assignRootAndMeta()
>   throws InterruptedException, IOException, KeeperException {
>     int assigned = 0;
>     long timeout = this.conf.getLong("hbase.catalog.verification.timeout", 
> 1000);
>     // Work on ROOT region.  Is it in zk in transition?
>     boolean rit = this.assignmentManager.
>       
> processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
>     if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>       this.assignmentManager.assignRoot();
>       this.catalogTracker.waitForRoot();
>       assigned++;
>     }
>     LOG.info("-ROOT- assigned=" + assigned + ", rit=" + rit +
>       ", location=" + catalogTracker.getRootLocation());
>     // Work on meta region
>     rit = this.assignmentManager.
>       
> processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
> ///////////////////////////////
> when restart cluster master break down here.
> ////////////////////////////////
>     if (!this.catalogTracker.verifyMetaRegionLocation(timeout)) {
>       this.assignmentManager.assignMeta();
>       this.catalogTracker.waitForMeta();
>       // Above check waits for general meta availability but this does not
>       // guarantee that the transition has completed
>       
> this.assignmentManager.waitForAssignment(HRegionInfo.FIRST_META_REGIONINFO);
>       assigned++;
>     }
>     LOG.info(".META. assigned=" + assigned + ", rit=" + rit +
>       ", location=" + catalogTracker.getMetaLocation());
>     return assigned;
>   }
> Thanks to JunQiang Yuan in www.alipay.com  for providing information about 
> this bug. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to