[ 
https://issues.apache.org/jira/browse/TS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262443#comment-13262443
 ] 

Brian Geffon edited comment on TS-1219 at 4/26/12 7:49 AM:
-----------------------------------------------------------

I have a possible theory for the crash, but I think jplevyak might be the only 
one who could say for sure. In 3.0.x inkfreelist_new follows the standard 
pattern which does a load followed by a CAS:

 int result = 0;

 do {

    INK_QUEUE_LD64(item, f->head);

    if (TO_PTR(FREELIST_POINTER(item)) == NULL) {

       ...

    } else {

       // crash happens on the following line
       SET_FREELIST_POINTER_VERSION(next, 
*ADDRESS_OF_NEXT(TO_PTR(FREELIST_POINTER(item)), f->offset),
                                   FREELIST_VERSION(item) + 1);


      result = ink_atomic_cas64((int64_t *) & f->head.data, item.data, 
next.data);

    }

 }

 while(result == 0)


I believe this might be a race in the event that the head is freed between the 
time of the load to item and the SET_FREELIST_POINTER_VERSION, then we would 
potentially be dereferencing item, which is no longer valid if it's freed as 
part of a dequeue operation. Does anyone who understands the freelists think 
this might be the situation?
                
      was (Author: briang):
    I have a possible theory for the crash, but I think jplevyak might be the 
only one who could say for sure. In 3.0.x inkfreelist_new follows the standard 
pattern which does a load followed by a CAS:

 int result = 0;
 do {
    INK_QUEUE_LD64(item, f->head);
    if (TO_PTR(FREELIST_POINTER(item)) == NULL) {
       ...
    } else {
       // crash happens on the following line
       SET_FREELIST_POINTER_VERSION(next, 
*ADDRESS_OF_NEXT(TO_PTR(FREELIST_POINTER(item)), f->offset),
                                   FREELIST_VERSION(item) + 1);

      result = ink_atomic_cas64((int64_t *) & f->head.data, item.data, 
next.data);
    }
 }
 while(result == 0)


I believe this might be a race in the event that the head is freed between the 
time of the load to item and the SET_FREELIST_POINTER_VERSION, then we would 
potentially be dereferencing item, which is no longer valid if it's freed as 
part of a dequeue operation. Does anyone who understands the freelists think 
this might be the situation?
                  
> Crash report: ink_freelist_new causing core dumps
> -------------------------------------------------
>
>                 Key: TS-1219
>                 URL: https://issues.apache.org/jira/browse/TS-1219
>             Project: Traffic Server
>          Issue Type: Bug
>    Affects Versions: 3.0.2
>            Reporter: Manjesh Nilange
>
> Our TS based production boxes are seeing a couple of crashes per day with 
> stack traces ending in
> #3  0x00000000004c85ea in signal_handler (sig=11) at signals.cc:225
> #4  <signal handler called>
> #5  0x0000003312414160 in ink_freelist_new (f=0x3312629ce0) at 
> ink_queue.cc:326
> #6  0x000000331240d47a in alloc_void (this=0x7fff80173cd8, size=511, 
> alignment=1) at Allocator.h:208
> #7  blk_alloc (this=0x7fff80173cd8, size=511, alignment=1) at Arena.cc:45
> #8  Arena::alloc (this=0x7fff80173cd8, size=511, alignment=1) at Arena.cc:118
> #9  0x0000000000569b93 in str_alloc (arena=<value optimized out>, 
>     url=0x31be6a8 "*************"..., 
>     len_in=<value optimized out>, len_out=0x7fff80173d20) at 
> ../../lib/ts/Arena.h:123
> #10 LogUtils::escapify_url (arena=<value optimized out>, 
>     url=0x31be6a8 "*************"..., 
>     len_in=<value optimized out>, len_out=0x7fff80173d20) at LogUtils.cc:392
> #11 0x00000000005482e9 in LogAccessHttp::init (this=0x7fff80173cc0) at 
> LogAccessHttp.cc:98
> #12 0x000000000054b436 in Log::access (lad=0x7fff80173cc0) at Log.cc:1055
> #13 0x00000000004f3085 in HttpSM::update_stats (this=0x2ae5480651e0) at 
> HttpSM.cc:6044
> #14 0x00000000004f8918 in HttpSM::kill_this (this=0x2ae5480651e0) at 
> HttpSM.cc:6005
> #15 0x00000000004f8c08 in HttpSM::main_handler (this=0x2ae5480651e0, 
> event=2301, data=0x2ae548066ec8)
>     at HttpSM.cc:2452
> The URL was valid, I just "anonymized" it. This is our environment
> $ uname -a
> Linux xxx.prod 2.6.32-131.4.1.el6.x86_64 #1 SMP Fri Jun 10 10:54:26 EDT 2011 
> x86_64 x86_64 x86_64 GNU/Linux
> $ file /usr/bin/traffic_server 
> /usr/bin/traffic_server: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
> dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
> There doesn't seem to be more useful information in the frame:
> (gdb) f 5
> #5  0x0000003312414160 in ink_freelist_new (f=0x3312629ce0) at 
> ink_queue.cc:326
> 326                                      FREELIST_VERSION(item) + 1);
> (gdb) list
> 321         fastalloc_mem_in_use += f->chunk_size * f->type_size;
> 322   #endif
> 323   
> 324       } else {
> 325         SET_FREELIST_POINTER_VERSION(next, 
> *ADDRESS_OF_NEXT(TO_PTR(FREELIST_POINTER(item)), f->offset),
> 326                                      FREELIST_VERSION(item) + 1);
> 327   #if !defined(INK_USE_MUTEX_FOR_FREELISTS)
> 328         result = ink_atomic_cas64((int64_t *) & f->head.data, item.data, 
> next.data);
> 329   #else
> 330         f->head.data = next.data;
> (gdb) p next
> $2 = <value optimized out>
> (gdb) p f
> $3 = (InkFreeList *) 0x3312629ce0
> (gdb) p *f 
> $4 = {head = {data = -8941773651046140890}, name = 0x331241e92f "ArenaBlock", 
> type_size = 1024, chunk_size = 128, 
>   count = 519, allocated = 1536, offset = 0, alignment = 8, allocated_base = 
> 0, count_base = 0}
> (gdb) p item
> $5 = {data = -8953314125091277786}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to