[
https://issues.apache.org/jira/browse/HBASE-7103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494652#comment-13494652
]
ramkrishna.s.vasudevan commented on HBASE-7103:
-----------------------------------------------
Ok Lars. I understand. No problem.
Just before we commit this i have a suggestion
{code}
String node = ZKAssign.getNodeName(zkw, region.getEncodedName());
if (!ZKUtil.createEphemeralNodeAndWatch(zkw, node, data.getBytes())) {
throw new IOException("Failed create of ephemeral " + node);
}
// Transition node from SPLITTING to SPLITTING and pick up version so we
// can be sure this znode is ours; version is needed deleting.
return transitionNodeSplitting(zkw, region, serverName, -1);
{code}
Here after creating the node we once transit the node from SPLITTING to
SPLITTING to get znode version. Can we get the znode version just after
creating the node.
So if creation itself fails there is no node at all. If it succeeds anyway as
next step will add the journal SET_SPLITTING_IN_ZK.
Now the transition will result in the version as 1 but if we don do the
transition it will be 0.
Now what advantage we get is next time if any parallel split comes the node
will already exist when it tries to create the znode and this will not do
anything with the znode while rollback. What do you feel? My intention was to
solve both 7103 and 6088.
Lars, i leave it to you. If you think we can revert this and address this in
next version 0.94.4. If not we can try for a patch this version. If you are
ok with that i can submit a patch for the same.
> Need to fail split if SPLIT znode is deleted even before the split is
> completed.
> --------------------------------------------------------------------------------
>
> Key: HBASE-7103
> URL: https://issues.apache.org/jira/browse/HBASE-7103
> Project: HBase
> Issue Type: Bug
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7103-6088-revert.txt, HBASE-7103_testcase.patch
>
>
> This came up after the following mail in dev list
> 'infinite loop of RS_ZK_REGION_SPLIT on .94.2'.
> The following is the reason for the problem
> The following steps happen
> -> Initially the parent region P1 starts splitting.
> -> The split is going on normally.
> -> Another split starts at the same time for the same region P1. (Not sure
> why this started).
> -> Rollback happens seeing an already existing node.
> -> This node gets deleted in rollback and nodeDeleted Event starts.
> -> In nodeDeleted event the RIT for the region P1 gets deleted.
> -> Because of this there is no region in RIT.
> -> Now the first split gets over. Here the problem is we try to transit the
> node to SPLITTING to SPLIT. But the node even does not exist.
> But we don take any action on this. We think it is successful.
> -> Because of this SplitRegionHandler never gets invoked.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira