[ 
https://issues.apache.org/jira/browse/HBASE-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282518#comment-13282518
 ] 

ramkrishna.s.vasudevan commented on HBASE-6088:
-----------------------------------------------

While we start doing the split, there are two steps in zk node creation.
-> Create the node
-> Write the data RS_ZK_SPLITTING into it.
Now after both the steps are completed we make an journal entry.  
Now if writing the data fails even on rollback we are not able to clean the 
node as we don't know the current journal entry.  
                
>  Region splitting not happened for long time due to ZK exception while 
> creating RS_ZK_SPLITTING node
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6088
>                 URL: https://issues.apache.org/jira/browse/HBASE-6088
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Gopinathan A
>             Fix For: 0.94.1
>
>
> Region splitting not happened for long time due to ZK exception while 
> creating RS_ZK_SPLITTING node
> {noformat}
> 2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 26668ms for sessionid 
> 0x1377a75f41d0012, closing socket connection and attempting reconnect
> 2012-05-24 01:45:41,464 WARN 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient 
> ZooKeeper exception: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> {noformat}
> {noformat}
> 2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: 
> cleanupCurrentWriter  waiting for transactions to get synced  total 189377 
> synced till here 189365
> 2012-05-24 01:45:48,474 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
> of failed split of 
> ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
> setting SPLITTING znode on 
> ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
> java.io.IOException: Failed setting SPLITTING znode on 
> ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.zookeeper.KeeperException$BadVersionException: 
> KeeperErrorCode = BadVersion for 
> /hbase/unassigned/bd1079bf948c672e493432020dc0e144
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
>       at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
>       at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
>       at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
>       at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
>       ... 5 more
> 2012-05-24 01:45:48,476 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of 
> failed split of 
> ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
> {noformat}
> {noformat}
> 2012-05-24 01:47:28,141 ERROR 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node 
> /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is 
> not a retry
> 2012-05-24 01:47:28,142 INFO 
> org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup 
> of failed split of 
> ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed 
> create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
> java.io.IOException: Failed create of ephemeral 
> /hbase/unassigned/bd1079bf948c672e493432020dc0e144
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
>       at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
> {noformat}
> Due to the above exception, region splitting was failing contineously more 
> than 5hrs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to