[
https://issues.apache.org/jira/browse/ZOOKEEPER-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002912#comment-13002912
]
Jeremy Stribling commented on ZOOKEEPER-981:
--------------------------------------------
When I run tsulin's test under valgrind on my system with Zookeeper 3.3.3,
after only 5 iterations I see double-free errors of the type
{noformat}
==10430== Invalid read of size 8
==10430== at 0x405823: free_completions (zookeeper.c:1066)
==10430== by 0x405B4F: cleanup_bufs (zookeeper.c:1125)
==10430== by 0x403D47: destroy (zookeeper.c:366)
==10430== by 0x409961: zookeeper_close (zookeeper.c:2327)
==10430== by 0x40785F: api_epilog (zookeeper.c:1661)
==10430== by 0x413A82: adaptor_finish (mt_adaptor.c:205)
==10430== by 0x4097DF: zookeeper_close (zookeeper.c:2298)
==10430== by 0x4036AC: zookeeper_client::~zookeeper_client()
(ZOOKEEPER-981.cpp:54)
==10430== by 0x403325: main (ZOOKEEPER-981.cpp:112)
==10430== Address 0x5b5c0e8 is 296 bytes inside a block of size 728 free'd
==10430== at 0x4C240FD: free (vg_replace_malloc.c:366)
==10430== by 0x409979: zookeeper_close (zookeeper.c:2329)
==10430== by 0x40785F: api_epilog (zookeeper.c:1661)
==10430== by 0x414083: do_completion (mt_adaptor.c:335)
==10430== by 0x4E2F8B9: start_thread (pthread_create.c:300)
==10430== by 0x58C002C: clone (clone.S:112)
{noformat}
However, when I run with the above attached patch, I was able to run over 100
times without any valgrind errors.
The patch itself probably isn't good enough as it is -- it's a mismatch of
inc_ref_counter and api_epilog. But I thought I'd post it here until a more
knowledgeable ZK developer can make a proper one.
> Hang in zookeeper_close() in the multi-threaded C client
> --------------------------------------------------------
>
> Key: ZOOKEEPER-981
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-981
> Project: ZooKeeper
> Issue Type: Bug
> Components: c client
> Affects Versions: 3.3.2
> Environment: Debian Squeeze, Linux 2.6.32-5, x86_64
> Reporter: Jeremy Stribling
> Priority: Critical
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-981.tar.gz, zookeeper-981.patch
>
>
> I saw a hang once when my C++ application called the zookeeper_close() method
> of the multi-threaded Zookeeper client library. The stack trace of the hung
> thread was the following:
> {quote}
> Thread 8 (Thread 5644):
> #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0
> #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at
> .../zookeeper/src/c/src/mt_adaptor.c:66
> #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1,
> reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069
> #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1,
> rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125
> #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at
> .../thirdparty/zookeeper/src/c/src/zookeeper.c:366
> #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2326
> #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at
> .../zookeeper/src/c/src/zookeeper.c:1661
> #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at
> .../zookeeper/src/c/src/mt_adaptor.c:205
> #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at
> .../zookeeper/src/c/src/zookeeper.c:2297
> ...
> {quote}
> The omitted part of the stack trace is entirely within my application, and
> contains no other calls to/from the Zookeeper client. In particular, I am
> not calling zookeeper_close() from within a completion handler or any of the
> library's threads.
> I haven't been able to reproduce this, and when I encountered this I wasn't
> capturing logging from the client library, so unfortunately I don't have any
> more information at this time. But I will update this JIRA if I see it again.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira