[ https://issues.apache.org/jira/browse/TS-4970?focusedWorklogId=30632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-30632 ]
ASF GitHub Bot logged work on TS-4970: -------------------------------------- Author: ASF GitHub Bot Created on: 14/Oct/16 13:43 Start Date: 14/Oct/16 13:43 Worklog Time Spent: 10m Work Description: Github user igalic commented on a diff in the pull request: https://github.com/apache/trafficserver/pull/1108#discussion_r83423846 --- Diff: proxy/InkAPI.cc --- @@ -1053,15 +1053,14 @@ int INKVConnInternal::handle_event(int event, void *edata) { handle_event_count(event); - if (m_deleted) { - if (m_deletable) { - this->mutex = NULL; - m_read_vio.set_continuation(NULL); - m_write_vio.set_continuation(NULL); - INKVConnAllocator.free(this); - } - } else { + // If the VConn isn't deleted, call the handler + if (!m_deleted) { return m_event_func((TSCont) this, (TSEvent) event, edata); + } else { --- End diff -- why is this an `else` if we previously `return`ed? Issue Time Tracking ------------------- Worklog Id: (was: 30632) Time Spent: 1.5h (was: 1h 20m) > Crash in INKVConnInternal when handle_event is called after destroy() > --------------------------------------------------------------------- > > Key: TS-4970 > URL: https://issues.apache.org/jira/browse/TS-4970 > Project: Traffic Server > Issue Type: Bug > Components: HTTP > Reporter: Thomas Jackson > Assignee: Thomas Jackson > Time Spent: 1.5h > Remaining Estimate: 0h > > We've noticed a few crashes for requests using SPDY (on ATS 5.2.x and 6..x) > where the downstream origin is down with a backtrace that looks something > like: > {code} > (gdb) bt > #0 0x0000000000000000 in ?? () > #1 0x00000000004cfe54 in set_continuation (this=0x2afe63a93530, event=1, > edata=0x2afe6399fc40) at ../iocore/eventsystem/P_VIO.h:104 > #2 INKVConnInternal::handle_event (this=0x2afe63a93530, event=1, > edata=0x2afe6399fc40) at InkAPI.cc:1060 > #3 0x00000000006f8e65 in handleEvent (this=0x2afe3dd07000, e=0x2afe6399fc40, > calling_code=1) at I_Continuation.h:146 > #4 EThread::process_event (this=0x2afe3dd07000, e=0x2afe6399fc40, > calling_code=1) at UnixEThread.cc:144 > #5 0x00000000006f993b in EThread::execute (this=0x2afe3dd07000) > at UnixEThread.cc:195 > #6 0x00000000006f832a in spawn_thread_internal (a=0x2afe3badf400) > at Thread.cc:88 > #7 0x0000003861c079d1 in start_thread () from /lib64/libpthread.so.0 > #8 0x00000038614e8b5d in clone () from /lib64/libc.so.6 > {code} > Which looks a bit odd-- as frame 0 is missing. From digging into it a bit > more (with the help of [~amc]) we found that the VC we where calling was an > INKContInternal (meaning an INKVConnInternal): > {code} > (gdb) p (INKVConnInternal) *vc_server > $5 = {<INKContInternal> = {<DummyVConnection> = {<VConnection> = > {<Continuation> = {<force_VFPT_to_top> = {_vptr.force_VFPT_to_top = > 0x2afe63a93170}, > handler = (int (Continuation::*)(Continuation *, int, > void *)) 0x4cfd90 <INKVConnInternal::handle_event(int, void*)>, mutex = { > m_ptr = 0x0}, link = {<SLink<Continuation>> = {next = 0x0}, > prev = 0x0}}, lerrno = 20600}, <No data fields>}, > mdata = 0xdeaddead, m_event_func = 0x2afe43c18490 > <(anonymous namespace)::handleTransformationPluginEvents(TSCont, > TSEvent, void*)>, m_event_count = 0, m_closed = -1, m_deletable = 1, > m_deleted = 1, > m_free_magic = INKCONT_INTERN_MAGIC_ALIVE}, m_read_vio = {_cont = 0x0, > nbytes = 0, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, > vc_server = 0x0, mutex = {m_ptr = 0x0}}, m_write_vio = {_cont = 0x0, > nbytes = 122, ndone = 0, op = 0, buffer = {mbuf = 0x0, entry = 0x0}, > vc_server = 0x2afe63a93530, mutex = {m_ptr = 0x0}}, > m_output_vc = 0x2afe63091a88} > {code} > From looking at the debug logs that lead up to the crash, I'm seeing that > some events (namely timeout events) are being called after the VConn has been > destroy()'d . This lead me to find that INKVConnInternal::handle_event is > actually checking if that is the case-- and then re-destroying everything, > which makes no sense. > So although the ideal would be to not call handle_event on a closed VConn, > crashing is definitely not acceptable. My solution is to continue to only > call the event handler if the VConn hasn't been deleted-- but instead of > attempting to re-destroy the connection, we'll leave it be (unless we are in > debug mode-- where I'll throw in an assert). > I did some looking at this on ATS7 and it looks like this is all fixed by the > cleanup of the whole free-ing stuff for VConns > (https://github.com/apache/trafficserver/pull/752/files). -- This message was sent by Atlassian JIRA (v6.3.4#6332)