Hi Parag, I had seen a crash from init_socket_for_ssl() on 3.6.2 c-client when using mTLS. The bt looked like this in my case. The root casue was `SSL_library_init` call is not thread safe. It’s been called from init_ssl_for_socket(). I am not sure if you’re hitting this issue or not.. If so, you could protect the init call with a lock.
(gdb-9.1-490) bt #0 0x00007f15324bb7bb in raise () from ...lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f1532456535 in abort () from .../lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f1533073abf in OpenSSLDie () from ...lib/x86_64-linux-gnu/libcrypto.so.1.0.0 #3 0x00007f1532a97a68 in ?? () from .../lib/x86_64-linux-gnu/libssl.so.1.0.0 #4 0x00007f3432d9af2b in SSL_library_init () from .../x86_64-linux-gnu/libssl.so.1.0.0 … -Thanks From: Mulay, Parag Bhausaheb (Parag) <para...@avaya.com> Date: Monday, April 4, 2022 at 7:40 AM To: dev@zookeeper.apache.org <dev@zookeeper.apache.org> Subject: RE: [External]Zookeeper C-client library getting stuck on pthread_join() call. NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. Hi All, I am sorry for flooding your mailboxes. I am not sure if I am sending this to a incorrect group or something? But any help or suggestions regarding this will be really helpful. Thanks -Parag -----Original Message----- From: Mulay, Parag Bhausaheb (Parag) <para...@avaya.com> Sent: Wednesday, March 30, 2022 8:04 AM To: dev@zookeeper.apache.org Subject: Re: [External]Zookeeper C-client library getting stuck on pthread_join() call. Hi All, Any suggestions about this? It will be of great help. Thanks, in advance -Parag ________________________________ From: Mulay, Parag Bhausaheb (Parag) Sent: Monday, March 28, 2022 11:18:33 AM To: dev@zookeeper.apache.org <dev@zookeeper.apache.org> Subject: RE: [External]Zookeeper C-client library getting stuck on pthread_join() call. It seems the mail looses its formatting. The code I added was the check for "close_requested", rest is existing code. Thanks -Parag -----Original Message----- From: Mulay, Parag Bhausaheb (Parag) <para...@avaya.com> Sent: Monday, March 28, 2022 11:12 AM To: dev@zookeeper.apache.org Subject: [External]Zookeeper C-client library getting stuck on pthread_join() call. [External Sender] Hi All, I am using the Zookeeper C-client libraries to connect to ZK servers. I am using 3.6.2 library. The problem I am facing is that the library gets stuck in pthread_join() call and never returns. The scenario is as follows: * Zookeeper C-client connects to zookeeper over a m-TLS connection. * The client loses network connectivity to zookeeper servers. * During this time the zookeeper client code calls function zookeeper_close(). * Zookeeper_close() never returns. * The state of the ZH handle during this time is ZOO_SSL_CONNECTING_STATE I made the program dump core while it was stuck in this state. The back trace shows that zookeeper_close() calls adaptor_finsih() which gets stuck in the phthread_join() call for the IO thread. This indicates that IO thread was stuck doing something. The backtrace for the IO thread shows this trace. Do_io() -> zookeeper_process() -> check_events() -> init_ssl_for_handler() -> init_ssl_for_socket(). While looking at the code there is a while(1) loop in init_ssl_for_socket() to which I added the following highlighted code and it seemed to have fixed the problem for me. Can anybody suggest if this is correct? Or if this problem has already been fixed in other releases? while(1) { int rc; int sock = fd->sock; struct timeval tv; fd_set s_rfds, s_wfds; tv.tv_sec = 1; tv.tv_usec = 0; FD_ZERO(&s_rfds); FD_ZERO(&s_wfds); if(zh->close_requested) { return ZSSLCONNECTIONERROR; } Many Thanks, -Parag