[
https://issues.apache.org/jira/browse/TS-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Susan Hinrichs reopened TS-3266:
--------------------------------
Assignee: Susan Hinrichs
We saw a stack trace very similar to this on multiple machines while trying
5.3.0 in production.
It looks like the problem is that the vc->read.vio.mutex is not the same as the
vc->read.vio._cont->mutex, but the code assumes they are the same.
In UnixNetVConnection the read_from_net function makes a check that these
mutexes are the same and reschedules itself if they are different.
Seems reasonable to make a similar check in SSLNetVConnection::net_read_io.
> core dump in UnixNetProcessor::connect_re_internal
> --------------------------------------------------
>
> Key: TS-3266
> URL: https://issues.apache.org/jira/browse/TS-3266
> Project: Traffic Server
> Issue Type: Bug
> Components: Core
> Affects Versions: 5.2.0
> Reporter: Sudheer Vinukonda
> Assignee: Susan Hinrichs
> Labels: crash
>
> See a new core dump in v5.2.0 after running stable for over 48 hours. Below
> is the bt and some gdb info.
> {code}
> (gdb) bt
> #0 0x0000000000773056 in EThread::is_event_type (this=0x0, et=2) at
> UnixEThread.cc:121
> #1 0x0000000000750cfe in UnixNetProcessor::connect_re_internal
> (this=0x1032fc0, cont=0x2aad4590d080, target=0x2aad4590d728,
> opt=0x2aac8b367600) at UnixNetProcessor.cc:247
> #2 0x000000000052b498 in NetProcessor::connect_re (this=0x1032fc0,
> cont=0x2aad4590d080, addr=0x2aad4590d728, opts=0x2aac8b367600) at
> ../iocore/net/P_UnixNetProcessor.h:85
> #3 0x00000000005e64e5 in HttpSM::do_http_server_open (this=0x2aad4590d080,
> raw=false) at HttpSM.cc:4796
> #4 0x00000000005edec7 in HttpSM::set_next_state (this=0x2aad4590d080) at
> HttpSM.cc:7141
> #5 0x00000000005ed2f2 in HttpSM::call_transact_and_set_next_state
> (this=0x2aad4590d080, f=0x607320
> <HttpTransact::HandleResponse(HttpTransact::State*)>) at HttpSM.cc:6961
> #6 0x00000000005e7b72 in HttpSM::handle_server_setup_error
> (this=0x2aad4590d080, event=104, data=0x2aade42edae8) at HttpSM.cc:5308
> #7 0x00000000005dc57c in HttpSM::state_send_server_request_header
> (this=0x2aad4590d080, event=104, data=0x2aade42edae8) at HttpSM.cc:1989
> #8 0x00000000005de6a2 in HttpSM::main_handler (this=0x2aad4590d080,
> event=104, data=0x2aade42edae8) at HttpSM.cc:2570
> #9 0x0000000000502eae in Continuation::handleEvent (this=0x2aad4590d080,
> event=104, data=0x2aade42edae8) at ../iocore/eventsystem/I_Continuation.h:146
> #10 0x00000000007524c3 in read_signal_and_update (event=104,
> vc=0x2aade42ed9d0) at UnixNetVConnection.cc:138
> #11 0x000000000075261e in read_signal_done (event=104, nh=0x2aac89a53ad0,
> vc=0x2aade42ed9d0) at UnixNetVConnection.cc:169
> #12 0x0000000000754cd4 in UnixNetVConnection::readSignalDone
> (this=0x2aade42ed9d0, event=104, nh=0x2aac89a53ad0) at
> UnixNetVConnection.cc:922
> #13 0x000000000073e088 in SSLNetVConnection::net_read_io
> (this=0x2aade42ed9d0, nh=0x2aac89a53ad0, lthread=0x2aac89a50010) at
> SSLNetVConnection.cc:596
> #14 0x000000000074c50d in NetHandler::mainNetEvent (this=0x2aac89a53ad0,
> event=5, e=0x282fb30) at UnixNet.cc:399
> #15 0x0000000000502eae in Continuation::handleEvent (this=0x2aac89a53ad0,
> event=5, data=0x282fb30) at ../iocore/eventsystem/I_Continuation.h:146
> #16 0x0000000000773172 in EThread::process_event (this=0x2aac89a50010,
> e=0x282fb30, calling_code=5) at UnixEThread.cc:144
> #17 0x000000000077367c in EThread::execute (this=0x2aac89a50010) at
> UnixEThread.cc:268
> #18 0x000000000077272d in spawn_thread_internal (a=0x2e1b740) at Thread.cc:88
> #19 0x00002aabd3d04851 in start_thread () from /lib64/libpthread.so.0
> #20 0x0000003296ee890d in clone () from /lib64/libc.so.6
> {code}
> {code}
> (gdb) frame 1
> #1 0x0000000000750cfe in UnixNetProcessor::connect_re_internal
> (this=0x1032fc0, cont=0x2aad4590d080, target=0x2aad4590d728,
> opt=0x2aac8b367600) at UnixNetProcessor.cc:247
> 247 UnixNetProcessor.cc: No such file or directory.
> in UnixNetProcessor.cc
> (gdb) print mutex
> $28 = (ProxyMutex *) 0x2aadf004d070
> (gdb) print *mutex
> $29 = {<RefCountObj> = {<ForceVFPTToTop> = {_vptr.ForceVFPTToTop = 0x77e890},
> m_refcount = 16}, the_mutex = {__data = {__lock = 0, __count = 0, __owner =
> 0, __nusers = 0, __kind = 0,
> __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000'
> <repeats 39 times>, __align = 0}, thread_holding = 0x0, nthread_holding = 0}
> (gdb) print t
> $30 = (EThread *) 0x0
> (gdb) print cont
> $31 = (Continuation *) 0x2aad4590d080
> (gdb) print *cont
> $32 = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 0x7aaef0}, handler =
> (int (Continuation::*)(Continuation *, int, void *)) 0x5de4ce
> <HttpSM::main_handler(int, void*)>, mutex = {
> m_ptr = 0x2aadf004d070}, link = {<SLink<Continuation>> = {next = 0x0},
> prev = 0x0}}
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)