Hi Azat, thanks for looking into this.
On Mon, Jan 22, 2018 at 9:27 PM, Azat Khuzhin <[email protected]> wrote: > On Mon, Jan 22, 2018 at 09:55:49AM +0100, Alexander Rojas wrote: > > Hey guys, > > > > Over the past couple of weeks I've been struggling with an old bug of > Apache Mesos related to our implementation of SSL Sockets which are based > on libevent, you can see the whole implementation in [1] and [2]. It > affects later versions of Linux, like Ubuntu 17.10 and OSX, full > description of the packages at MESOS-8271 [3]. > > Hi Alexander, > > Can you try with openssl 1.1? (from the issue I saw openssl 1.0) > I did the update, the problem changed from the accept to the recv call. While now it can `accept()` socket connections, the `recv` callback fails. I need to investigate further here, but it behaves just like the macosx problem. > > > The problem can be triggered simply by updating libevent from 2.0.22 to > 2.18. > > Wow, there were a lot of changes since 2.0.22. So I don't think that > bisect is a good idea here. > > > In linux we see the issue when we call `accept()`. When we reach the > event callback we keep getting `events & BEV_EVENT_ERROR && events & > BEV_EVENT_READ` [4] however `EVUTIL_SOCKET_ERROR()` keeps returning 0, and > `bufferevent_get_openssl_error(bev)` also shows no error. > > > > On the client we find that the connection was closed, but the symptoms > are similar, we get `BEV_EVENT_ERROR` but no other diagnostics. > > > > I include a full stack trace of a run where the problem appears (Line > numbers may be somewhat off due to debug output lines added). Any more info > I am happy to contribute. > > > > ``` > > process::network::internal::LibeventSSLSocketImpl::accept_ > SSL_callback(process::network::internal::LibeventSSLSocketImpl:: > AcceptRequest*)::$_10::operator()(bufferevent*, short, void*) const > (this=0x160eae0, bev=0x7fffe4001a10, events=33, arg=0x7fffc4000c10) at > ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:1152 > > process::network::internal::LibeventSSLSocketImpl::accept_ > SSL_callback(process::network::internal::LibeventSSLSocketImpl:: > AcceptRequest*)::$_10::__invoke(bufferevent*, short, void*) > (bev=0x7fffe4001a10, events=33, arg=0x7fffc4000c10) at > ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:1143 > > ?? () from /usr/lib/x86_64-linux-gnu/libevent_openssl-2.1.so.6 > > ?? () from /usr/lib/x86_64-linux-gnu/libevent_openssl-2.1.so.6 > > ?? () from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6 > > event_base_loop () from /usr/lib/x86_64-linux-gnu/libevent-2.1.so.6 > > Can you install debug symbols for libevent? So that we can see line > numbers? But it looks like it called from do_read() -> conn_closed(), if > this is true I need "err" that obtained from SSL_get_error() in > do_read(). > I installed debug symbols, this is a newer stack trace: ``` #0 process::network::internal::LibeventSSLSocketImpl::accept_SSL_callback(process::network::internal::LibeventSSLSocketImpl::AcceptRequest*)::$_10::operator()(bufferevent*, short, void*) const (this=0x1608fe0, bev=0x7fffe4001a10, events=33, arg=0x7fffc8000c10) at ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:1190 #1 0x0000000000d1876a in process::network::internal::LibeventSSLSocketImpl::accept_SSL_callback(process::network::internal::LibeventSSLSocketImpl::AcceptRequest*)::$_10::__invoke(bufferevent*, short, void*) (bev=0x7fffe4001a10, events=33, arg=0x7fffc8000c10) at ../../../3rdparty/libprocess/src/libevent_ssl_socket.cpp:1101 #2 0x00007ffff725dc48 in do_handshake (bev_ssl=0x7fffe4001a10) at bufferevent_openssl.c:1053 #3 0x00007ffff725dd80 in be_openssl_handshakeeventcb (fd=<optimized out>, what=<optimized out>, ptr=0x7fffe4001a10) at bufferevent_openssl.c:1075 #4 0x00007ffff65616aa in event_persist_closure (ev=<optimized out>, base=0x1608d40) at event.c:1580 #5 event_process_active_single_queue (base=base@entry=0x1608d40, activeq=0x15c98a0, max_to_process=max_to_process@entry=2147483647, endtime=endtime@entry=0x0) at event.c:1639 #6 0x00007ffff6562227 in event_process_active (base=0x1608d40) at event.c:1738 #7 event_base_loop (base=0x1608d40, flags=<optimized out>) at event.c:1961 #8 0x0000000000d4f8f5 in process::EventLoop::run () at ../../../3rdparty/libprocess/src/libevent.cpp:98 #9 0x0000000000c82b47 in std::__invoke_impl<void, void (*)()> (__f=@0x15cb3e8: 0xd4f870 <process::EventLoop::run()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/invoke.h:60 #10 0x0000000000c82add in std::__invoke<void (*)()> (__fn=@0x15cb3e8: 0xd4f870 <process::EventLoop::run()>) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/invoke.h:95 #11 0x0000000000c82ab5 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x15cb3e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/thread:234 #12 0x0000000000c82a85 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x15cb3e8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/thread:243 #13 0x0000000000c82979 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x15cb3e0) at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/thread:186 #14 0x00007ffff5bc70ff in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #15 0x00007ffff7bbd7fc in start_thread (arg=0x7fffe9e6b700) at pthread_create.c:465 #16 0x00007ffff52d2b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ``` > > And the next step will be to try to turn BEV_OPT_DEFER_CALLBACKS off (so > that we could see more reliable call sequence). > Will try later in the day. > > > process::EventLoop::run () at ../../../3rdparty/libprocess/ > src/libevent.cpp:98 > > Thanks, > Azat. > *********************************************************************** > To unsubscribe, send an e-mail to [email protected] with > unsubscribe libevent-users in the body. >
