[
https://issues.apache.org/jira/browse/HBASE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925521#action_12925521
]
Jonathan Gray commented on HBASE-3159:
--------------------------------------
This is fairly easy to "fix" in a way that will not make the master abort. But
this does not get to the underlying cause of triggered to OPENED handlers.
There's something going on there that we need to keep digging on (I'm doing so
now but with added logging it's not happening anymore).
The fix to prevent abort is to transition the in-memory RIT to the OPEN state
when we handleRegion(regionTransitionData). Exactly like we're already doing
in the CLOSED handling in that method, we need to do this:
+ regionState.update(RegionState.State.OPEN, data.getStamp());
When the next attempted handle of the OPENED state comes in, we won't process
it because it's not in the expected states of PENDING_OPEN or OPENING, and then
the closed handler won't be executed.
But yeah, let's not put that fix in yet. Seems like some screwy with ZK
watches being fired though we don't expect a watch to be set or something.
Digging more...
> Double play of OpenedRegionHandler for a single region; fails second time
> through and aborts Master
> ---------------------------------------------------------------------------------------------------
>
> Key: HBASE-3159
> URL: https://issues.apache.org/jira/browse/HBASE-3159
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Blocker
> Fix For: 0.90.0
>
> Attachments: hbase-meta-dupe-opened-master-only.log,
> hbase-meta-dupe-opened.log
>
>
> Here is master log with annotations:
> http://people.apache.org/~stack/master.txt
> Region in question is:
> b8827a67a9d446f345095d25e1f375f7
> The running code is doctored in that I've added in a bit of logging -- zk in
> particular -- and I've also removed what I thought was a provocation of this
> condition, reassign inside in an assign if server has gone away when we try
> the open rpc (Turns out we have the condition even w/o this code in place).
> The log starts where the region in question timesout in RIT.
> We assign it to 186.
> Notice how we see 'Handling transition' for this region TWICE. This means
> two OpenedRegionHandlers will be scheduled -- and so the failure to delete a
> znode already gone.
> As best I can tell, the watcher for this region is triggered once only --
> which is odd because how then the double scheduling of OpenedRegionHandler
> but also, why am I not seeing OPENING, OPENING, OPENED and only what I
> presume is an OPENED?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.