[ 
https://issues.apache.org/jira/browse/HBASE-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119258#comment-13119258
 ] 

ramkrishna.s.vasudevan commented on HBASE-4479:
-----------------------------------------------

As per my analysis
When a master tries to come up and due to ZK exception if fails and we call 
abort where we try once again if things get back to normal.
{code}
 if (t != null && t instanceof KeeperException.SessionExpiredException) {
      try {
        LOG.info("Primary Master trying to recover from ZooKeeper session " +
            "expiry.");
        return !tryRecoveringExpiredZKSession();
      } catch (Throwable newT) {
        LOG.error("Primary master encountered unexpected exception while " +
            "trying to recover from ZooKeeper session" +
            " expiry. Proceeding with server abort.", newT);
      }
    }
{code}
Here we try to assign the ROOT and META and the RS hosting it may not be online 
that time.
So carry on with processRIT and this being a clean cluster startup we try to 
assignAllUserRegions().
As part of which we try to retainAssignment().
Here the no of servers itself is 0 .
{code}
for (ServerName server : servers) {
      assignments.put(server, new ArrayList<HRegionInfo>());
    }
{code}
assignments.size() = 0.
{code}
else {
        int size = assignments.size();
        assignments.get(servers.get(RANDOM.nextInt(size))).add(region.getKey());
      }
{code}
This throws illegalArgumentException which makes the master to abort.
Though this may be a testcase failure there is a rare chance that this can also 
happen in real time and the attempt made to bring the master alive due to ZK 
exception may not work because of this.

Pls correct me if am wrong.





                
> TestMasterFailover failure in Hbase-0.92#17
> -------------------------------------------
>
>                 Key: HBASE-4479
>                 URL: https://issues.apache.org/jira/browse/HBASE-4479
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Minor
>
> When the master restarted it was not able to get any servers online and the 
> restart was a clean restart.
> Hence there were no regions to assign.
> Hence the retainAssignment tries to get one of the regions and uses 
> RANDOM.getInt(size).  Here size is 0.
> So ideally 0 is not accepted here.  Hence we have got an exception making the 
> master not to come up and the test case timeout.
> Though we need to see if really no regions was expected when the master came 
> up, but this JIRA's intent is to deal such scenario where the size can be 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to