[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

ASF GitHub Bot (JIRA) Fri, 14 Oct 2016 06:44:58 -0700

     [ 
https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30632
 ]


ASF GitHub Bot logged work on TS-4970:
--------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Oct/16 13:43
            Start Date: 14/Oct/16 13:43
    Worklog Time Spent: 10m 
      Work Description: Github user igalic commented on a diff in the pull 
request:

    https://github.com/apache/trafficserver/pull/1108#discussion_r83423846
  
    --- Diff: proxy/InkAPI.cc ---
    @@ -1053,15 +1053,14 @@ int
     INKVConnInternal::handle_event(int event, void *edata)
     {
       handle_event_count(event);
    -  if (m_deleted) {
    -    if (m_deletable) {
    -      this->mutex = NULL;
    -      m_read_vio.set_continuation(NULL);
    -      m_write_vio.set_continuation(NULL);
    -      INKVConnAllocator.free(this);
    -    }
    -  } else {
    +  // If the VConn isn't deleted, call the handler
    +  if (!m_deleted) {
         return m_event_func((TSCont) this, (TSEvent) event, edata);
    +  } else {
    --- End diff --
    
    why is this an `else` if we previously `return`ed?


Issue Time Tracking
-------------------

    Worklog Id:     (was: 30632)
    Time Spent: 1.5h  (was: 1h 20m)

> Crash in INKVConnInternal when handle_event is called after destroy()
> ---------------------------------------------------------------------
>
>                 Key: TS-4970
>                 URL: https://issues.apache.org/jira/browse/TS-4970
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: HTTP
>            Reporter: Thomas Jackson
>            Assignee: Thomas Jackson
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) 
> where the downstream origin is down with a backtrace that looks something 
> like:
> {code}
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> #1  0x00000000004cfe54 in set_continuation (this=0x2afe63a93530, event=1, 
>     edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104
> #2  INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, 
>     edata=0x2afe6399fc40) at InkAPI.cc:1060
> #3  0x00000000006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, 
>     calling_code=1) at I_Continuation.h:146
> #4  EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, 
>     calling_code=1) at UnixEThread.cc:144
> #5  0x00000000006f993b in EThread::execute (this=0x2afe3dd07000)
>     at UnixEThread.cc:195
> #6  0x00000000006f832a in spawn_thread_internal (a=0x2afe3badf400)
>     at Thread.cc:88
> #7  0x0000003861c079d1 in start_thread () from /lib64/libpthread.so.0
> #8  0x00000038614e8b5d in clone () from /lib64/libc.so.6
> {code}
> Which looks a bit odd-- as frame 0 is missing. From digging into it a bit 
> more (with the help of [~amc]) we found that the VC we where calling was an 
> INKContInternal (meaning an INKVConnInternal):
> {code}
> (gdb) p (INKVConnInternal) *vc_server
> $5 = {<INKContInternal> = {<DummyVConnection> = {<VConnection> = 
> {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = 
> 0x2afe63a93170}, 
>           handler = (int (Continuation::*)(Continuation *, int, 
>     void *)) 0x4cfd90 <INKVConnInternal::handle_event(int, void*)>, mutex = {
>             m_ptr = 0x0}, link = {<SLink<Continuation>> = {next = 0x0}, 
>             prev = 0x0}}, lerrno = 20600}, <No data fields>}, 
>     mdata = 0xdeaddead, m_event_func = 0x2afe43c18490
>      <(anonymous namespace)::handleTransformationPluginEvents(TSCont, 
> TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, 
> m_deleted = 1, 
>     m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, 
>     nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
>     vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, 
>     nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, 
>     vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, 
>   m_output_vc = 0x2afe63091a88}
> {code}
> From looking at the debug logs that lead up to the crash, I'm seeing that 
> some events (namely timeout events) are being called after the VConn has been 
> destroy()'d . This lead me to find that INKVConnInternal::handle_event is 
> actually checking if that is the case-- and then re-destroying everything, 
> which makes no sense.
> So although the ideal would be to not call handle_event on a closed VConn, 
> crashing is definitely not acceptable. My solution is to continue to only 
> call the event handler if the VConn hasn't been deleted-- but instead of 
> attempting to re-destroy the connection, we'll leave it be (unless we are in 
> debug mode-- where I'll throw in an assert).
> I did some looking at this on ATS7 and it looks like this is all fixed by the 
> cleanup of the whole free-ing stuff for VConns 
> (https://github.com/apache/trafficserver/pull/752/files).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work logged] (TS-4970) Crash in INKVConnInternal when handle_event is called after destroy()

Reply via email to