[ 
https://issues.apache.org/jira/browse/HBASE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603218#comment-13603218
 ] 

nkeywal commented on HBASE-8097:
--------------------------------

bq. The timestamp updated in DeadServer.add seems bogus to me.
You're right. It should be a putIfAbsent if it was a concurrentMap. The methods 
are synchronized and it should be critical, so the following implementation 
should be correct:
{code}
  /**
   * Adds the server to the dead server list if it's not there already.
   * @param sn the server name
   */
  public synchronized void add(ServerName sn) {
    this.numProcessing++;
    if (!deadServers.containsKey(sn)){
      deadServers.put(sn, EnvironmentEdgeManager.currentTimeMillis());
    }
  }
{code}

Tell me if you want me to create another JIRA for this or if you don't mind 
adding this into this JIRA.

For numProcessing, it seems there are several bugs as well: if you have an 
exception anywhere (for example in verifyAndAssignMetaWithRetries();) we have a 
broken state: we haven't decreased the numProcessing but we're not working on 
it anymore. If we want to be exception safe (as ServerShutdownHandler is) we 
need the finally imho. I can't find a better solution than:

{code}
  @Override
  public void process() throws IOException {
    boolean gotException = true;
    try {
      try {
        LOG.info("Splitting META logs for " + serverName);
        if (this.shouldSplitHlog) {
          this.services.getMasterFileSystem().splitMetaLog(serverName);
        }
      } catch (IOException ioe) {
        this.deadServers.add(serverName);
        this.services.getExecutorService().submit(this);
        throw new IOException("failed log splitting for " +
            serverName + ", will retry", ioe);
      }


      // Assign root and meta if we were carrying them.
      if (isCarryingMeta()) { // .META.
        // Check again: region may be assigned to other where because of RIT
        // timeout
        if (this.services.getAssignmentManager().isCarryingMeta(serverName)) {
          LOG.info("Server " + serverName
              + " was carrying META. Trying to assign.");
          this.services.getAssignmentManager().regionOffline(
              HRegionInfo.FIRST_META_REGIONINFO);
          verifyAndAssignMetaWithRetries();
        } else {
          LOG.info("META has been assigned to otherwhere, skip assigning.");
        }
      }
      gotException = false;
    } finally {
      if (gotException){
        // If we had an exception we can't rely on super.process to say we 
finished the process.
        this.deadServers.finish(serverName);
      }
    }
    super.process();
  }
{code}

I can't say I like, but it should do the job...


                
> MetaServerShutdownHandler may potentially keep bumping up 
> DeadServer.numProcessing
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8097
>                 URL: https://issues.apache.org/jira/browse/HBASE-8097
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.96.0
>
>         Attachments: 8097.txt, hbase-8097_1.patch
>
>
> {code}
>     } catch (IOException ioe) {
>       this.services.getExecutorService().submit(this);
>       this.deadServers.add(serverName);
>       throw new IOException("failed log splitting for " +
>           serverName + ", will retry", ioe);
>     }
> {code}
> this.deadServers.add(serverName); will keep incrementing 
> DeadServer.numProcessing
> We can't get rid of numProcessing by just checking deadServers.size() because 
> deadServers is also used to report some historically failed RSs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to