[
https://issues.apache.org/jira/browse/ZOOKEEPER-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996260#comment-12996260
]
tsulin commented on ZOOKEEPER-981:
----------------------------------
I have the same problem.
It may be reproduced by the following steps:
1 create two zk handle A and B
2 use A to create an ephemeral node under path P
3 use B to getchild of P and set a watcher
4 in the watcher function, getchild of P and set the watcher
5 close A
6 close B
It will be reproduced in a probability of about 10%.
I found zookeeper_close is called three times when closing B. destroy is called
twice, one of which is called from do_completion.
I think there is a race condition in zookeeper_close.
int zookeeper_close(zhandle_t *zh)
{
int rc=ZOK;
if (zh==0)
return ZBADARGUMENTS;
zh->close_requested=1;
if (inc_ref_counter(zh,0)!=0) {
/* Signal any syncronous completions before joining the threads */
enter_critical(zh);
free_completions(zh,1,ZCLOSING);
leave_critical(zh);
adaptor_finish(zh); // If do_completion is finished before here,
zookeeper_close will be called twice. Once in do_completion, another in
adaptor_finish.
return ZOK;
}
if(zh->state==ZOO_CONNECTED_STATE){
> Hang in zookeeper_close() in the multi-threaded C client
> --------------------------------------------------------
>
> Key: ZOOKEEPER-981
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-981
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.3.2
> Environment: Debian Squeeze, Linux 2.6.32-5, x86_64
> Reporter: Jeremy Stribling
> Priority: Critical
> Fix For: 3.3.3, 3.4.0
>
>
> I saw a hang once when my C++ application called the zookeeper_close() method
> of the multi-threaded Zookeeper client library. The stack trace of the hung
> thread was the following:
> {quote}
> Thread 8 (Thread 5644):
> #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0
> #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at
> .../zookeeper/src/c/src/mt_adaptor.c:66
> #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1,
> reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069
> #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1,
> rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125
> #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at
> .../thirdparty/zookeeper/src/c/src/zookeeper.c:366
> #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2326
> #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at
> .../zookeeper/src/c/src/zookeeper.c:1661
> #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at
> .../zookeeper/src/c/src/mt_adaptor.c:205
> #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2297
> ...
> {quote}
> The omitted part of the stack trace is entirely within my application, and
> contains no other calls to/from the Zookeeper client. In particular, I am
> not calling zookeeper_close() from within a completion handler or any of the
> library's threads.
> I haven't been able to reproduce this, and when I encountered this I wasn't
> capturing logging from the client library, so unfortunately I don't have any
> more information at this time. But I will update this JIRA if I see it again.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira