[ 
https://issues.apache.org/jira/browse/TS-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163156#comment-13163156
 ] 

John Plevyak commented on TS-937:
---------------------------------

This patch shouldn't have any effect in correct code.  Checking ->canceled 
outside of the mutex is at best an optimization.  In any case, before the 
continuation is used the canceled field has to be checked under the lock.  Most 
likely the core problem is that someone is canceling the event without holding 
the lock (a situation which subsumes the case of being in a callback as weijin 
suggested).  If this happens, there is a potential race where process_event is 
in progress, someone cancels the event (without the lock) and deletes the 
Continuation resulting in a crash.

However, this patch does reduce the change of a race condition in the debug 
build.   The core problem is that the macro MUTEX_LOCK_FOR is accessing 
HANDLER_NAME
which dereferences the continuation.  This is bad.  The macro needs to be fixed 
to delay deferencing the Continuation until the lock is taken.
                
> EThread::execute still processing cancelled event
> -------------------------------------------------
>
>                 Key: TS-937
>                 URL: https://issues.apache.org/jira/browse/TS-937
>             Project: Traffic Server
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 3.0.1, 2.1.9
>         Environment: RHEL6
>            Reporter: Brian Geffon
>             Fix For: 3.1.2
>
>         Attachments: UnixEThread.patch
>
>
> The included GDB log will show that ATS is trying to process an event that 
> has already been canceled, examining the code of UnixEThread.cc line 232 
> shows that EThread::process_event gets called without a check for the event 
> being cancelled. 
> Brian
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff64fa700 (LWP 28518)]
> 0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, 
> e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> Missing separate debuginfos, use: debuginfo-install 
> expat-2.0.1-9.1.el6.x86_64 glibc-2.12-1.25.el6_1.3.x86_64 
> keyutils-libs-1.4-1.el6.x86_64 krb5-libs-1.9-9.el6_1.1.x86_64 
> libcom_err-1.41.12-7.el6.x86_64 libgcc-4.4.5-6.el6.x86_64 
> libselinux-2.0.94-5.el6.x86_64 libstdc++-4.4.5-6.el6.x86_64 
> openssl-1.0.0-10.el6_1.4.x86_64 pcre-7.8-3.1.el6.x86_64 
> tcl-8.5.7-6.el6.x86_64 zlib-1.2.3-25.el6.x86_64
> (gdb) bt
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, 
> e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at 
> UnixEThread.cc:232
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> (gdb) bt full
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, 
> e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
>         lock = {m = {m_ptr = 0x7ffff64f9d20}, lock_acquired = 202}
> #1  0x00000000006fcbaf in EThread::execute (this=0x7ffff68ff010) at 
> UnixEThread.cc:232
>         done_one = false
>         e = 0x1db45c0
>         NegativeQueue = {<DLL<Event, Event::Link_link>> = {head = 0xfc75f0}, 
> tail = 0xfc75f0}
>         next_time = 1314647904419648000
> #2  0x00000000006fb844 in spawn_thread_internal (a=0xfb7e80) at Thread.cc:88
>         p = 0xfb7e80
> #3  0x00000036204077e1 in start_thread () from /lib64/libpthread.so.0
> No symbol table info available.
> #4  0x000000361f8e577d in clone () from /lib64/libc.so.6
> No symbol table info available.
> (gdb) f 0
> #0  0x00000000006fc663 in EThread::process_event (this=0x7ffff68ff010, 
> e=0x1db45c0, calling_code=1) at UnixEThread.cc:130
> 130  MUTEX_TRY_LOCK_FOR(lock, e->mutex.m_ptr, this, e->continuation);
> (gdb) p *e
> $2 = {<Action> = {_vptr.Action = 0x775170, continuation = 0x1f2fc08, mutex = 
> {m_ptr = 0x7fffd40fba40}, cancelled = 1}, ethread = 0x7ffff68ff010, 
> in_the_prot_queue = 0, in_the_priority_queue = 0, 
>   immediate = 1, globally_allocated = 1, in_heap = 0, callback_event = 1, 
> timeout_at = 0, period = 0, cookie = 0x0, link = {<SLink<Event>> = {next = 
> 0x0}, prev = 0x0}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to