[ 
https://issues.apache.org/jira/browse/HBASE-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521118#comment-14521118
 ] 

Matteo Bertozzi commented on HBASE-13217:
-----------------------------------------

{quote}Here is what the znode looks like when MASTER thinks the procedure is 
completed (obviously, myregionserver-5 is missing under 'reached' znode{quote}
my first guess is that when the procedure started the regionserver-5 was in 
transition and not in the list. there should be a list of "online 
regionservers" that the procedure uses to setup the latch used to decide when 
everyone is completed. that procedure code doesn't tolerate region in 
transition, so that may be the problem you are seeing. (but may be something 
else, this is just my first guess without looking at the code)

> Flush procedure fails in trunk due to ZK issue
> ----------------------------------------------
>
>                 Key: HBASE-13217
>                 URL: https://issues.apache.org/jira/browse/HBASE-13217
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: Stephen Yuan Jiang
>
> When ever I try to flush explicitly in the trunk code the flush procedure 
> fails due to ZK issue
> {code}
> ERROR: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable 
> via 
> stobdtserver3,16040,1426172670959:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
>         at 
> org.apache.hadoop.hbase.procedure.Procedure.isCompleted(Procedure.java:368)
>         at 
> org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager.isProcedureDone(MasterFlushTableProcedureManager.java:196)
>         at 
> org.apache.hadoop.hbase.master.MasterRpcServices.isProcedureDone(MasterRpcServices.java:905)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:47019)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2073)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>         at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for 
> /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:273)
>         at 
> org.apache.hadoop.hbase.procedure.ProcedureMember.controllerConnectionFailure(ProcedureMember.java:225)
>         at 
> org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.sendMemberAcquired(ZKProcedureMemberRpcs.java:254)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:166)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         ... 1 more
> {code}
> Once this occurs, even on restart of the RS the RS becomes unusable.  I have 
> verified that the ZK remains intact and there is no problem with it.  a bit 
> older version of trunk ( 3months) does not have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to