[ 
https://issues.apache.org/jira/browse/TS-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500065#comment-14500065
 ] 

Alan M. Carroll commented on TS-3522:
-------------------------------------

Looking at a core dump from this on 5.3.0.

The problem is a {{UnixNetVConnection}} where {{write.vio._cont}} is dead 
memory ({{0xdeadbeef}}). The transaction stalls and then InactivityCop tries to 
shut down the netVC. This falls through and triggers the write signalling which 
explodes on contact with the bad continuation. This is just a symptom, however, 
the real problem is the bad continuation being in the VIO.

In a case I dug in to the state machine is still alive. The transaction was a 
{{POST}} (http -> http) and the state machine had apparently sent the request 
and post data to the origin and was waiting for a response. I couldn't find how 
the write VIO continuation could become dead memory because all of the 
{{do_io_write}} calls seem to use the {{HttpSM}} as the continuation which is 
clearly not dead memory (the write VIO continuation pointer doesn't seem to 
match any of the relevant objects).

Other anomalies are that {{from_accept_thread}} is {{true}} which should not be 
the case. The IP address in the {{con}} member doesn't match the request. The 
{{action_}} continuation handler is {{SSLNextProtocolAccept::mainEvent}}. I 
need to check if this is appropriate for a non-SSL connection (this may be a 
standard way to do the protocol detect). On the other hand this looks like 
there is some possibly bad sharing going on where this connection is being used 
for an origin server and user agent connection.

> Seg Fault due to inactivity_cop after lost continutation from 
> write_signal_and_update
> -------------------------------------------------------------------------------------
>
>                 Key: TS-3522
>                 URL: https://issues.apache.org/jira/browse/TS-3522
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Network
>            Reporter: Steven Feltner
>            Assignee: Alan M. Carroll
>             Fix For: 6.0.0
>
>
> (gdb) bt full
> #0  0x00000000006ec51e in handleEvent (event=105, vc=0x2b1c900461e0) at 
> ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #1  write_signal_and_update (event=105, vc=0x2b1c900461e0) at 
> UnixNetVConnection.cc:154
> No locals.
> #2  0x00000000006ec837 in UnixNetVConnection::mainEvent (this=0x2b1c900461e0, 
> event=<value optimized out>, e=<value optimized out>) at 
> UnixNetVConnection.cc:1089
>         wlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         signal_event = 105
>         next_activity_timeout_at = 0
>         t = 0x0
>         hlock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
>         rlock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         signal_timeout = 0x2b1c6b9ddc30
>         reader_cont = 0x0
>         writer_cont = 0x2b1d28051d48
>         signal_timeout_at = 0x2b1c900463f8
> #3  0x00000000006e5061 in handleEvent (this=0x14519d0, event=<value optimized 
> out>, e=0x15792d0) at ../../iocore/eventsystem/I_Continuation.h:146
> No locals.
> #4  InactivityCop::check_inactivity (this=0x14519d0, event=<value optimized 
> out>, e=0x15792d0) at UnixNet.cc:80
>         vc = 0x2b1c900461e0
>         lock = {m = {m_ptr = 0x2b1c90117dd0}, lock_acquired = true}
>         now = 1428965697221995775
>         nh = 0x2b1c695bea30
>         __func__ = "check_inactivity"
> #5  0x000000000070f628 in handleEvent (this=0x2b1c695bb010, e=0x15792d0, 
> calling_code=2) at I_Continuation.h:146
> No locals.
> #6  EThread::process_event (this=0x2b1c695bb010, e=0x15792d0, calling_code=2) 
> at UnixEThread.cc:144
>         c_temp = 0x14519d0
>         lock = {m = {m_ptr = 0x1430c30}, lock_acquired = true}
> #7  0x00000000007101c1 in EThread::execute (this=0x2b1c695bb010) at 
> UnixEThread.cc:223
>         done_one = true
>         e = <value optimized out>
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0x1579330}, 
> tail = 0x1579330}
>         next_time = 1428963217761407178
> #8  0x000000000070ea52 in spawn_thread_internal (a=0x144a330) at Thread.cc:88
>         p = 0x144a330
> #9  0x000000383e8079d1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #10 0x000000383e0e88fd in clone () from /lib64/libc.so.6
> No symbol table info available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to