DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

bugzilla Wed, 13 Feb 2008 22:56:25 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=44402>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=44402


[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |NEW




------- Additional Comments From [EMAIL PROTECTED]  2008-02-13 22:56 -------
I did the stress test with the patch you suggested. After your patch, I
still got the 1st crash. If it crashed in second stack trace then I will update
the bug.

Here are some more information about 1st crash.
* I am able to reproduce the crash on Solaris 10 update 1 (on a different
  machine) too. It took around 4 hours of stress before I got the crash on
  Solaris 10 while it takes around < 30 minutes to reproduce on Solaris nevada.
  It was crash 1 (allocator_free with node = null) (without your patch).

Here is more information of the crash 1 :
(dbx) where
current thread: [EMAIL PROTECTED]
=>[1] allocator_free(allocator = 0x8afe2e0, node = (nil)), line 331 in 
"apr_pools.c"
  [2] apr_pool_clear(pool = 0xa0d01c0), line 710 in "apr_pools.c"
  [3] ap_core_output_filter(f = 0xa0b28c8, b = 0xa0b2a08), line 899 in
"core_filters.c"
  [4] ap_pass_brigade(next = 0xa0b28c8, bb = 0xa0b2a08), line 526 in 
"util_filter.c"
  [5] logio_out_filter(f = 0xa0b2888, bb = 0xa0b2a08), line 135 in "mod_logio.c"
  [6] ap_pass_brigade(next = 0xa0b2888, bb = 0xa0b2a08), line 526 in 
"util_filter.c"
  [7] ap_flush_conn(c = 0xa0b23e8), line 84 in "connection.c"
  [8] ap_lingering_close(c = 0xa0b23e8), line 123 in "connection.c"
  [9] process_socket(p = 0x8afe368, sock = 0x8aff660, my_child_num = 1,
my_thread_num = 18, bucket_alloc = 0xa0be178), line 545 in "worker.c"
  [10] worker_thread(thd = 0x81487d8, dummy = 0x8117b30), line 894 in "worker.c"
  [11] dummy_worker(opaque = 0x81487d8), line 142 in "thread.c"
  [12] _thr_setup(0xfe244800), at 0xfeccf92e
  [13] _lwp_start(), at 0xfeccfc10
(dbx) up
Current function is apr_pool_clear
  710       allocator_free(pool->allocator, active->next);
(dbx) p *active
*active = {
    next        = (nil)
    ref         = 0xa0d01a8
    index       = 1U
    free_index  = 0
    first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
    endp        = 0xa0d21a8 "^A "
}
(dbx) up
Current function is ap_core_output_filter
  899               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p *ctx
*ctx = {
    b                   = (nil)
    deferred_write_pool = 0xa0d01c0
}
(dbx) p *ctx->deferred_write_pool
*ctx->deferred_write_pool = {
    parent           = 0x8afe368
    child            = (nil)
    sibling          = 0xa0c6198
    ref              = 0x8afe36c
    cleanups         = (nil)
    free_cleanups    = (nil)
    allocator        = 0x8afe2e0
    subprocesses     = (nil)
    abort_fn         = (nil)
    user_data        = (nil)
    tag              = 0x80bfd1c "deferred_write"
    active           = 0xa0d01a8
    self             = 0xa0d01a8
    self_first_avail = 0xa0d01f8 "\xc0^A^M\n\xfc^A^M\n\xfc^A^M\nx\xe1^K\n"
}
(dbx) p *c
*c = {
    pool                  = 0x8afe368
    base_server           = 0x80e6bf8
    vhost_lookup_data     = (nil)
    local_addr            = 0x8aff698
    remote_addr           = 0x8aff7c0
    remote_ip             = 0xa0b2850 "192.168.11.1"
    remote_host           = (nil)
    remote_logname        = (nil)
    aborted               = 0
    keepalive             = AP_CONN_KEEPALIVE
    double_reverse        = 0
    keepalives            = 1
    local_ip              = 0xa0b2840 "192.168.11.2"
    local_host            = (nil)
    id                    = 518
    conn_config           = 0xa0b2448
    notes                 = 0xa0b26e8
    input_filters         = 0xa0b2870
    output_filters        = 0xa0b2888
    sbh                   = 0xa0b23e0
    bucket_alloc          = 0xa0be178
    cs                    = (nil)
    data_in_input_filters = 0
}

One putting some printfs I figured out the following :

In apr_pool_clear (when invoked for deferred_write_pool)
    ...
    active = pool->active = pool->self;
    active->first_avail = pool->self_first_avail;

    if (active->next == active)
        return;

active->next should typically be s circular link list. What is happenning some
cases is that active->next points to some thing else and active->ref still
points to active->next.  I put a printf of active->next before it is set to
NULL. For a particular crash, here is my debugging session. I found that
active->next
was set to 0x20e8810 before it was set to NULL.

(dbx) up
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) up
Current function is ap_core_output_filter
  923               apr_pool_clear(ctx->deferred_write_pool);
(dbx) p (struct apr_memnode_t*)0x20e8810 -----> This was active->next before set
to NULL.
(struct apr_memnode_t *) 0x20e8810 = 0x20e8810
(dbx) p *(struct apr_memnode_t*)0x20e8810
*((struct apr_memnode_t *) 0x20e8810) = {
    next        = 0x288c5b0
    ref         = 0x20e8810
    index       = 1U
    free_index  = 0
    first_avail = 0x20e9eb0 "GET /file_set/dir00104/class1_3 HTTP/1.0"
    endp        = 0x20ea810 "^A "
}
(dbx) down
Current function is apr_pool_clear
  774       allocator_free(pool->allocator, active->next);
(dbx) p active
active = 0x20e27e0
(dbx) p *((struct apr_memnode_t*)0x20e8810)->next
*((struct apr_memnode_t *) 0x20e8810)->next = {
    next        = 0x20e07d0
    ref         = 0x20e07d0
    index       = 1U
    free_index  = 0
    first_avail = 0x288d008 ""
    endp        = 0x288e5b0 "^A "
}
(dbx) p active
active = 0x20e27e0
(dbx) p *(((struct apr_memnode_t*)0x20e8810)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next = {
    next        = 0x28905d0
    ref         = 0x288c5b0
    index       = 1U
    free_index  = 0
    first_avail = 0x20e2738 ""
    endp        = 0x20e27d0 "^A "
}
(dbx) p *((((struct apr_memnode_t*)0x20e8810)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next = {
    next        = 0x288e5c0
    ref         = 0x28905d0
    index       = 1U
    free_index  = 0
    first_avail = 0x2890668 "\xf8^E\x89^B"
    endp        = 0x28925d0 "^Q^P"
}
(dbx) p *(((((struct apr_memnode_t*)0x20e8810)->next)->next)->next)->next
*((struct apr_memnode_t *) 0x20e8810)->next->next->next->next = {
    next        = (nil)
    ref         = (nil)
    index       = 1U
    free_index  = 0
    first_avail = 0x288e5e8 "`^_"
    endp        = 0x28905c0 "^A "
}

On further debugging, I figured out that typically ap_core_output_filter is
called 4 times for a request. The crash always happen in 4th invocation. It
seems to me that it gets corrupted somewhere after the 3rd invocation (after it
returns from ap_core_output_filter) and before it enters into
ap_core_output_filter 4th time (when ap_lingering_close is in call stack). Also
conn->keepalives was always set to 1.


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 44402] - Worker mpm crashes (SEGV) under stress with static workload on solaris x86

Reply via email to