[jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

Jonathan Gray (JIRA) Wed, 27 Oct 2010 16:36:44 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925605#action_12925605
 ]


Jonathan Gray commented on HBASE-3159:
--------------------------------------

Stack, when you rerun your tests again, turn off the ZK client logging and 
ensure all of our ZK logging is set to pickup DEBUG level.  There's always a 
slight chance we would want the raw ZK client logs, if something really crazy 
is happening, but there should be enough logging in our ZKW and ZKUtil as long 
as we pick up debug.

One thing though, change the method at the bottom of ZKUtil to the following:

{noformat}
  private static void logRetrievedMsg(final ZooKeeperWatcher zkw,
      final String znode, final byte [] data, final boolean watcherSet) {
    if (!LOG.isDebugEnabled()) return;
    LOG.debug(zkw.prefix("Retrieved " + ((data == null)? 0: data.length) +
      " byte(s) of data from znode " + znode +
      (watcherSet? " and set watcher; ": "; data=") +
      (data == null? "null": (
          znode.startsWith(zkw.assignmentZNode) ?
              RegionTransitionData.fromBytes(data).toString()
              : StringUtils.abbreviate(Bytes.toString(data), 32)))));
  }
{noformat}

The change is that we detect if we're logging an unassigned znode, and if so, 
we print the region transition data.  This will make debugging this much 
simpler.

> Double play of OpenedRegionHandler for a single region; fails second time 
> through and aborts Master
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3159
>                 URL: https://issues.apache.org/jira/browse/HBASE-3159
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: hbase-meta-dupe-opened-master-only.txt, 
> hbase-meta-dupe-opened.txt
>
>
> Here is master log with annotations: 
> http://people.apache.org/~stack/master.txt
> Region in question is:
> b8827a67a9d446f345095d25e1f375f7
> The running code is doctored in that I've added in a bit of logging -- zk in 
> particular -- and I've also removed what I thought was a provocation of this 
> condition, reassign inside in an assign if server has gone away when we try 
> the open rpc (Turns out we have the condition even w/o this code in place).
> The log starts where the region in question timesout in RIT.
> We assign it to 186.
> Notice how we see 'Handling transition' for this region TWICE.  This means 
> two OpenedRegionHandlers will be scheduled -- and so the failure to delete a 
> znode already gone.
> As best I can tell, the watcher for this region is triggered once only -- 
> which is odd because how then the double scheduling of OpenedRegionHandler 
> but also, why am I not seeing OPENING, OPENING, OPENED and only what I 
> presume is an OPENED?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

Reply via email to