[jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

Jonathan Gray (JIRA) Wed, 27 Oct 2010 17:30:42 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925617#action_12925617
 ]


Jonathan Gray commented on HBASE-3159:
--------------------------------------

Probably not related but just uncovered a small race condition in 
AssignmentManager around line 806 in assign(RegionState):

{noformat}
      // Send OPEN RPC. This can fail if the server on other end is is not up.
      serverManager.sendRegionOpen(plan.getDestination(), state.getRegion());
      // Transition RegionState to PENDING_OPEN
      state.update(RegionState.State.PENDING_OPEN);
{noformat}

We need to update the state to PENDING_OPEN before we send the RPC.  Otherwise 
we could get the OPENING and we'll still be in OFFLINE state locally so we will 
reject the transition.

> Double play of OpenedRegionHandler for a single region; fails second time 
> through and aborts Master
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3159
>                 URL: https://issues.apache.org/jira/browse/HBASE-3159
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>         Attachments: hbase-meta-dupe-opened-master-only.txt, 
> hbase-meta-dupe-opened.txt, TestRollingRestart-v4.patch
>
>
> Here is master log with annotations: 
> http://people.apache.org/~stack/master.txt
> Region in question is:
> b8827a67a9d446f345095d25e1f375f7
> The running code is doctored in that I've added in a bit of logging -- zk in 
> particular -- and I've also removed what I thought was a provocation of this 
> condition, reassign inside in an assign if server has gone away when we try 
> the open rpc (Turns out we have the condition even w/o this code in place).
> The log starts where the region in question timesout in RIT.
> We assign it to 186.
> Notice how we see 'Handling transition' for this region TWICE.  This means 
> two OpenedRegionHandlers will be scheduled -- and so the failure to delete a 
> znode already gone.
> As best I can tell, the watcher for this region is triggered once only -- 
> which is odd because how then the double scheduling of OpenedRegionHandler 
> but also, why am I not seeing OPENING, OPENING, OPENED and only what I 
> presume is an OPENED?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-3159) Double play of OpenedRegionHandler for a single region; fails second time through and aborts Master

Reply via email to