[ https://issues.apache.org/jira/browse/ZOOKEEPER-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152858#comment-13152858 ]
helei commented on ZOOKEEPER-981: --------------------------------- Sorry for not response in time. I saw another problem with this patch applied. Hang in zookeeper_close() again. here is the stack: (gdb) bt #0 0x000000302b80adfb in __lll_mutex_lock_wait () from /lib64/tls/libpthread.so.0 #1 0x000000302b1307a8 in main_arena () from /lib64/tls/libc.so.6 #2 0x000000302b910230 in stack_used () from /lib64/tls/libpthread.so.0 #3 0x000000302b808dde in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #4 0x00000000006b4ce8 in adaptor_finish (zh=0x6902060) at src/mt_adaptor.c:217 #5 0x00000000006b0fd0 in zookeeper_close (zh=0x6902060) at src/zookeeper.c:2297 (gdb) p zh->ref_counter $5 = 1 (gdb) p zh->close_requested $6 = 1 (gdb) p *zh $7 = {fd = 110112576, hostname = 0x6903620 "", addrs = 0x0, addrs_count = 1, watcher = 0x62e5dc <doris::meta_register_mgr_t::register_mgr_watcher(_zhandle*, int, int, char const*, void*)>, last_recv = {tv_sec = 1321510694, tv_usec = 552835}, last_send = {tv_sec = 1321510694, tv_usec = 552886}, last_ping = {tv_sec = 1321510685, tv_usec = 774869}, next_deadline = { tv_sec = 1321510704, tv_usec = 547831}, recv_timeout = 30000, input_buffer = 0x0, to_process = {head = 0x0, last = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, to_send = {head = 0x0, last = 0x0, lock = { __m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 1, __m_lock = {__status = 0, __spinlock = 0}}}, sent_requests = {head = 0x0, last = 0x0, cond = {__c_lock = {__status = 1, __spinlock = -1}, __c_waiting = 0x0, __padding = '\0' <repeats 15 times>, __align = 0}, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, completions_to_process = {head = 0x2aefbff800, last = 0x2af0e05f40, cond = {__c_lock = {__status = 592705486850, __spinlock = -1}, __c_waiting = 0x45, __padding = "E\000\000\000\000\000\000\000\220\006\000\000\000", __align = 296352743424}, lock = {__m_reserved = 1, __m_count = 0, __m_owner = 0x1000026ca, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, connect_index = 0, client_id = {client_id = 86551148676999146, passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, last_zxid = 82057372, outstanding_sync = 0, primer_buffer = {buffer = 0x6902290 "", len = 40, curr_offset = 44, next = 0x0}, primer_storage = {len = 36, protocolVersion = 0, timeOut = 30000, sessionId = 86551148676999146, passwd_len = 16, passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, primer_storage_buffer = "\000\000\000$\000\000\000\000\000\000u0\0013}惜薵闬000\000\000\020G懵擀\233\213\f闬202筴\002錪\034", state = 0, context = 0x0, auth_h = {auth = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, ref_counter = 1, close_requested = 1, adaptor_priv = 0x0, socket_readable = {tv_sec = 0, tv_usec = 0}, active_node_watchers = 0x6901520, active_exist_watchers = 0x69015d0, active_child_watchers = 0x6902ef0, chroot = 0x0} I think the ref_counter is suposed to be 2 or 3 here. 1 seems not correct. thanks again > Hang in zookeeper_close() in the multi-threaded C client > -------------------------------------------------------- > > Key: ZOOKEEPER-981 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-981 > Project: ZooKeeper > Issue Type: Bug > Components: c client > Affects Versions: 3.3.2 > Environment: Debian Squeeze, Linux 2.6.32-5, x86_64 > Reporter: Jeremy Stribling > Assignee: Jeremy Stribling > Priority: Critical > Fix For: 3.4.0 > > Attachments: ZOOKEEPER-981-v1.patch, ZOOKEEPER-981.tar.gz, > zookeeper-981.patch > > > I saw a hang once when my C++ application called the zookeeper_close() method > of the multi-threaded Zookeeper client library. The stack trace of the hung > thread was the following: > {quote} > Thread 8 (Thread 5644): > #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0 > #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from > /lib/libpthread.so.0 > #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at > .../zookeeper/src/c/src/mt_adaptor.c:66 > #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1, > reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069 > #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1, > rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125 > #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at > .../thirdparty/zookeeper/src/c/src/zookeeper.c:366 > #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at > .../zookeeper/src/c/src/zookeeper.c:2326 > #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at > .../zookeeper/src/c/src/zookeeper.c:1661 > #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at > .../zookeeper/src/c/src/mt_adaptor.c:205 > #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at > .../zookeeper/src/c/src/zookeeper.c:2297 > ... > {quote} > The omitted part of the stack trace is entirely within my application, and > contains no other calls to/from the Zookeeper client. In particular, I am > not calling zookeeper_close() from within a completion handler or any of the > library's threads. > I haven't been able to reproduce this, and when I encountered this I wasn't > capturing logging from the client library, so unfortunately I don't have any > more information at this time. But I will update this JIRA if I see it again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira