[ 
https://issues.apache.org/jira/browse/HBASE-11445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-11445:
----------------------------------

    Attachment: hbase-11445.patch

[~jmhsieh] Could you check the patch? It seems caused by a race condition if 
all ProcedureMembers create subprocedures around the same time. Thanks.

> TestZKProcedure#testMultiCohortWithMemberTimeoutDuringPrepare is flaky
> ----------------------------------------------------------------------
>
>                 Key: HBASE-11445
>                 URL: https://issues.apache.org/jira/browse/HBASE-11445
>             Project: HBase
>          Issue Type: Bug
>          Components: snapshots
>            Reporter: Jeffrey Zhong
>             Fix For: 0.99.0, 0.98.3
>
>         Attachments: failure.txt, hbase-11445.patch
>
>
> Recently there is a failure from Jenkins 
> build:https://builds.apache.org/job/HBase-0.98/364/testReport/junit/org.apache.hadoop.hbase.procedure/TestZKProcedure/testMultiCohortWithMemberTimeoutDuringPrepare/.
> Below are related log message and Member: 'one' joining twice:
> {noformat}
> 2014-06-29 19:26:34,101 DEBUG [member: 'three' subprocedure-pool11-thread-1] 
> procedure.ZKProcedureMemberRpcs(237): Member: 'one' joining acquired barrier 
> for procedure (op) in zk
> 2014-06-29 19:26:34,101 DEBUG [member: 'one' subprocedure-pool9-thread-1] 
> procedure.Subprocedure(162): Subprocedure 'op' locally acquired
> 2014-06-29 19:26:34,101 DEBUG [member: 'one' subprocedure-pool9-thread-1] 
> procedure.ZKProcedureMemberRpcs(237): Member: 'one' joining acquired barrier 
> for procedure (op) in zk
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to