[ 
https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-6748:
---------------------------------

    Attachment: hbase-6748.patch


I checked the issue and was able to repro the issue once but not always.

There are two issues:
1) delete using retry count=long.MAX_VALUE 2) new zk client instance created 
during master abort may not be seen by other threads due to no volatile 
declaration
    
Attached patch including:
1) refactoring code to handle ZK session expired consistently in all zk async 
callback functions as we currently do in CreateRescan & GetData async callbacks
2) retry deletion in TimeoutMonitor where other maintenance work are done. 
remove existing infinite loop like async calls which may jam callback queue
3) make RecoverableZooKeeper.zk volatile

Thanks,
-Jeffrey

                
> Endless recursive of deleteNode happened in 
> SplitLogManager#DeleteAsyncCallback
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-6748
>                 URL: https://issues.apache.org/jira/browse/HBASE-6748
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.94.1, 0.96.0
>            Reporter: Jieshan Bean
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: hbase-6748.patch
>
>
> You can ealily understand the problem from the below logs:
> {code}
> [2012-09-01 11:41:02,062] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
> create rc =SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=3
> [2012-09-01 11:41:02,062] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
> create rc =SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=2
> [2012-09-01 11:41:02,063] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
> create rc =SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=1
> [2012-09-01 11:41:02,063] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
> create rc =SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=0
> [2012-09-01 11:41:02,063] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task 
> node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
> [2012-09-01 11:41:02,063] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
> [2012-09-01 11:41:02,063] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
> delete rc=SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=9223372036854775807
> [2012-09-01 11:41:02,064] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
> delete rc=SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=9223372036854775806
> [2012-09-01 11:41:02,064] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
> delete rc=SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=9223372036854775805
> [2012-09-01 11:41:02,064] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
> delete rc=SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=9223372036854775804
> [2012-09-01 11:41:02,065] [WARN ] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
> delete rc=SESSIONEXPIRED for 
> /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
>  remaining retries=9223372036854775803
> ...................
> [2012-09-01 11:41:03,307] [ERROR] 
> [MASTER_SERVER_OPERATIONS-xh03,20000,1339549619270-1] 
> [org.apache.zookeeper.ClientCnxn 623] Caught unexpected throwable
> java.lang.StackOverflowError
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to