[Bug 49504] Solaris 10/x64 worker graceful restart problem

bugzilla Wed, 31 Oct 2012 15:46:26 -0700

https://issues.apache.org/bugzilla/show_bug.cgi?id=49504


[email protected] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected]

--- Comment #2 from [email protected] ---
As part of restarting apache, we'll send httpd a graceful restart
signal.  When that happens, many of the prefork children get stuck
trying to exit with stacks that look like this.  These processes never
finish exiting, and hang out indefinitely consuming system resources.

151404: /usr/apache2/current/bin/httpd -f /var/run/ak/httpd.conf -k start
 fee9bd17 lwp_mutex_timedlock (fb3a0000, 0, fec72a40)
 fee8e12c mutex_lock_internal (fb3a0000, 0, 1, fee8e521) + 254
 fee8e6f0 mutex_lock_impl (fb3a0000, 0, 806954d, fee8e7ab) + 1e0
 fee8e7de mutex_lock (fb3a0000, 0, 8293228, fec9c0d1) + 42
 fec9c0e6 proc_mutex_proc_pthread_acquire (8198260, 0, 17, 80dfe00, 8047b98,
fefa0970) + 22
 fec9c5d2 apr_proc_mutex_lock (8198260, 2, 0, 0) + 16
 080a90ba child_main (17, 80a8a90, 8047c78, 80ab11b) + 2ba
 080ab160 ap_mpm_run (80e1230, 810f2e8, 80e3128, 8072703) + 654
 08072710 main     (5, 8047d38, 8047d50, 8047d2c) + da8
 080718cd _start   (5, 8047e14, 8047e33, 8047e36, 8047e4d, 8047e50) + 7d

The process is blocked trying to acquire the accept_mutex in
child_main().  Looking at a variety of the accept_mutex fields in the
processes shows many that are uninitialized, like the two below:

fb3a0000 {
    fb3a0000 flags = {
        fb3a0000 flag1 = 0x1
        fb3a0002 flag2 = 0
        fb3a0003 ceiling = 0
        fb3a0004 mbcp_type_un = {
            fb3a0004 bcptype = 0
            fb3a0004 mtype_rcount = {
                fb3a0004 count_type1 = 0
                fb3a0005 count_type2 = 0
            }
        }
        fb3a0006 magic = 0
    }
    fb3a0008 lock = {
        fb3a0008 lock64 = {
            fb3a0008 pad = [ 0, 0, 0, 0, 0, 0, 0x1, 0x1 ]
        }
        fb3a0008 lock32 = {
            fb3a0008 ownerpid = 0
            fb3a000c lockword = 0x1010000
        }
        fb3a0008 owner64 = 0x101000000000000
    }
    fb3a0010 data = 0xfec72a40
}
fb3a0000 {
    fb3a0000 flags = {
        fb3a0000 flag1 = 0
        fb3a0002 flag2 = 0
        fb3a0003 ceiling = 0
        fb3a0004 mbcp_type_un = {
            fb3a0004 bcptype = 0
            fb3a0004 mtype_rcount = {
                fb3a0004 count_type1 = 0
                fb3a0005 count_type2 = 0
            }
        }
        fb3a0006 magic = 0
    }
    fb3a0008 lock = {
        fb3a0008 lock64 = {
            fb3a0008 pad = [ 0, 0, 0, 0, 0, 0, 0x1, 0x1 ]
        }
        fb3a0008 lock32 = {
            fb3a0008 ownerpid = 0
            fb3a000c lockword = 0x1010000
        }
        fb3a0008 owner64 = 0x101000000000000
    }
    fb3a0010 data = 0xfec72a40
}

These don't have their magic bits set, and don't have flags that
correspond to being a process robust mutex.  Here's what a good one
ought to look like:

fb3a0000 {
    fb3a0000 flags = {
        fb3a0000 flag1 = 0x4
        fb3a0002 flag2 = 0
        fb3a0003 ceiling = 0
        fb3a0004 mbcp_type_un = {
            fb3a0004 bcptype = 0x51
            fb3a0004 mtype_rcount = {
                fb3a0004 count_type1 = 0x51
                fb3a0005 count_type2 = 0
            }
        }
        fb3a0006 magic = 0x4d58
    }
    fb3a0008 lock = {
        fb3a0008 lock64 = {
            fb3a0008 pad = [ 0xf2, 0xf9, 0x1, 0, 0, 0, 0, 0x1 ]
        }
        fb3a0008 lock32 = {
            fb3a0008 ownerpid = 0x1f9f2
            fb3a000c lockword = 0x1000000
        }
        fb3a0008 owner64 = 0x10000000001f9f2
    }
    fb3a0010 data = 0xfec72a40
}

Since we got lucky, and had all of the processes map this mutex at the
same location in their address space, it's possible to walk the mapping
segments in the kernel, and observe who is sharing anonymous mappings.
When a process uses mmap(2) with MAP_SHARED and MAP_ANON, all processes
that share the same anonymous mapping use the same struct anon_map.

Here are the refcounts for all of the anon_map entries for segments
mapped at this address.

fffff6033c5493e0 refcnt = 0x6
fffff6033c5493e0 refcnt = 0x6
fffff6033c5493e0 refcnt = 0x6
fffff6033c5493e0 refcnt = 0x6
fffff6033c5493e0 refcnt = 0x6
fffff6033c5493e0 refcnt = 0x6
fffff60331b0a938 refcnt = 0x5
fffff60331b0a938 refcnt = 0x5
fffff60331b0a938 refcnt = 0x5
fffff60331b0a938 refcnt = 0x5
fffff60331b0a938 refcnt = 0x5
fffff60315a4c240 refcnt = 0x3
fffff60315a4c240 refcnt = 0x3
fffff60315a4c240 refcnt = 0x3
fffff60336801470 refcnt = 0x3
fffff60336801470 refcnt = 0x3
fffff60336801470 refcnt = 0x3
fffff6035cb691c8 refcnt = 0x2
fffff6035cb691c8 refcnt = 0x2
fffff6032f20e4e8 refcnt = 0x2
fffff6032f20e4e8 refcnt = 0x2
fffff6033c30e030 refcnt = 0x1
fffff60385a833d8 refcnt = 0x2
fffff60385a833d8 refcnt = 0x2
fffff6036a7f2f08 refcnt = 0x3
fffff6036a7f2f08 refcnt = 0x3
fffff6036a7f2f08 refcnt = 0x3
fffff603eb317f08 refcnt = 0x4
fffff603eb317f08 refcnt = 0x4
fffff603eb317f08 refcnt = 0x4
fffff603eb317f08 refcnt = 0x4
fffff60496ccebe0 refcnt = 0x2
fffff60496ccebe0 refcnt = 0x2
fffff60443485580 refcnt = 0x3
fffff60443485580 refcnt = 0x3
fffff60443485580 refcnt = 0x3
fffff6026ef6fb50 refcnt = 0x3
fffff6026ef6fb50 refcnt = 0x3
fffff6026ef6fb50 refcnt = 0x3
fffff60425527d68 refcnt = 0x3
fffff60425527d68 refcnt = 0x3
fffff60425527d68 refcnt = 0x3
fffff6016d080d70 refcnt = 0x1
fffff60088c6f8b0 refcnt = 0x2
fffff60088c6f8b0 refcnt = 0x2
fffff60314dd5028 refcnt = 0x2
fffff60314dd5028 refcnt = 0x2
fffff60308085ac0 refcnt = 0x1
If the mutex were being correctly shared between all httpd processes, we
should only a single anon_map object, with a much higher reference
count.  Instead, there are multiple anon_map structures, each with a few
mappings referencing them.  This strongly implies that somebody has
unmapped the shared mutex, and created a new mapping on top of it.

Using DTrace, we can see some suspicious invocations of both the create
and cleanup of this type of mutex.

  1 125899 proc_mutex_proc_pthread_cleanup:entry
              libapr-1.so.0.3.9`proc_mutex_proc_pthread_cleanup
              libapr-1.so.0.3.9`apr_proc_mutex_cleanup+0x16
              libapr-1.so.0.3.9`apr_pool_clear+0x7bf
              httpd`main+0xc7d
              httpd`_start+0x7d

  1 125900 proc_mutex_proc_pthread_create:entry
              libapr-1.so.0.3.9`proc_mutex_proc_pthread_create
              libapr-1.so.0.3.9`apr_proc_mutex_create+0x98
              httpd`ap_mpm_run+0x82
              httpd`main+0xda8
              httpd`_start+0x7d

Since we know the address of the accept_mutex, it's actually possible to
catch the specific call to the cleanup and the munmap.  This looks bad.

destroying accept mutex 8198260

              libapr-1.so.0.3.9`proc_mutex_proc_pthread_cleanup
              libapr-1.so.0.3.9`apr_proc_mutex_cleanup+0x16
              libapr-1.so.0.3.9`apr_pool_clear+0x7bf
              httpd`main+0xc7d
              httpd`_start+0x7d

Unmapping mutex at fb3a0000

              libc.so.1`munmap+0x7
              libapr-1.so.0.3.9`apr_proc_mutex_cleanup+0x16
              libapr-1.so.0.3.9`apr_pool_clear+0x7bf
              httpd`main+0xc7d
              httpd`_start+0x7d

According to DTrace, we're calling proc_mutex_proc_pthread_cleanup on
the accept_mutex out of apr_pool_clear every time we go back to the top
of the for loop in main().

Here's an explanation of the code flow:

in server/main.c main():

    for (;;) {
        apr_hook_deregister_all();
(3) --> apr_pool_clear(pconf);

        for (mod = ap_prelinked_modules; *mod != NULL; mod++) {
            ap_register_hooks(*mod, pconf);
        }

        /* This is a hack until we finish the code so that it only reads
         * the config file once and just operates on the tree already in
         * memory.  rbb
         */
        ap_conftree = NULL;
        apr_pool_create(&ptemp, pconf);
        apr_pool_tag(ptemp, "ptemp");
        ap_server_root = def_server_root;
        server_conf = ap_read_config(process, ptemp, confname, &ap_conftree);
        if (!server_conf) {
            destroy_and_exit_process(process, 1);
        }
        /* sort hooks here to make sure pre_config hooks are sorted properly */
        apr_hook_sort_all();

        if (ap_run_pre_config(pconf, plog, ptemp) != OK) {
            ap_log_error(APLOG_MARK, APLOG_STARTUP |APLOG_ERR,
                         0, NULL, "Pre-configuration failed");
            destroy_and_exit_process(process, 1);
        }

        if (ap_process_config_tree(server_conf, ap_conftree, process->pconf,
                                   ptemp) != OK) {
            destroy_and_exit_process(process, 1);
        }
        ap_fixup_virtual_hosts(pconf, server_conf);
        ap_fini_vhost_config(pconf, server_conf);
        /*
         * Sort hooks again because ap_process_config_tree may have added
         * modules and hence hooks. This happens with mod_perl and modules
         * written in perl.
         */
        apr_hook_sort_all();
        apr_pool_clear(plog);
        if (ap_run_open_logs(pconf, plog, ptemp, server_conf) != OK) {
            ap_log_error(APLOG_MARK, APLOG_STARTUP |APLOG_ERR,
                         0, NULL, "Unable to open logs");
            destroy_and_exit_process(process, 1);
        }

        if (ap_run_post_config(pconf, plog, ptemp, server_conf) != OK) {
            ap_log_error(APLOG_MARK, APLOG_STARTUP |APLOG_ERR,
                         0, NULL, "Configuration Failed");
            destroy_and_exit_process(process, 1);
        }

        apr_pool_destroy(ptemp);
        apr_pool_lock(pconf, 1);

        ap_run_optional_fn_retrieve();

(1) --> if (ap_mpm_run(pconf, plog, server_conf))
            break;

        apr_pool_lock(pconf, 0);
    }

in server/mpm/prefork/prefork.c ap_mpm_run():

    int ap_mpm_run(apr_pool_t *_pconf, apr_pool_t *plog, server_rec *s)
    {
        int index;
        int remaining_children_to_start;
        apr_status_t rv;

        ap_log_pid(pconf, ap_pid_fname);

        first_server_limit = server_limit;
        if (changed_limit_at_restart) {
            ap_log_error(APLOG_MARK, APLOG_WARNING, 0, s,
                         "WARNING: Attempt to change ServerLimit "
                         "ignored during restart");
            changed_limit_at_restart = 0;
        }

        /* Initialize cross-process accept lock */
        ap_lock_fname = apr_psprintf(_pconf, "%s.%" APR_PID_T_FMT,
                                     ap_server_root_relative(_pconf,
ap_lock_fname),
                                     ap_my_pid);

(2) --> rv = apr_proc_mutex_create(&accept_mutex, ap_lock_fname,
                                   ap_accept_lock_mech, _pconf);
        <...>

in srclib/apr/locks/unix/proc_mutex.c proc_mutex_pthread_cleanup():

    static apr_status_t proc_mutex_proc_pthread_cleanup(void *mutex_)
    {
        apr_proc_mutex_t *mutex=mutex_;
        apr_status_t rv;

        if (mutex->curr_locked == 1) {
            if ((rv = pthread_mutex_unlock(mutex->pthread_interproc))) {
    #ifdef HAVE_ZOS_PTHREADS
                rv = errno;
    #endif
                return rv;
            }
        }
        /* curr_locked is set to -1 until the mutex has been created */
        if (mutex->curr_locked != -1) {
(4) -->     if ((rv = pthread_mutex_destroy(mutex->pthread_interproc))) {
    #ifdef HAVE_ZOS_PTHREADS
                rv = errno;
    #endif
                return rv;
            }
        }
(5) --> if (munmap((caddr_t)mutex->pthread_interproc, sizeof(pthread_mutex_t)))
{
            return errno;
        }
        return APR_SUCCESS;
    }

Each call to ap_mpm_run (1) generates a call to apr_proc_mutex_create
(2).  This means that every time through the for(;;) loop we call
ap_mpm_run, we'll be creating a new accept_mutex.  The exiting threads
end up getting stuck when ap_mpm_run returns, goes to the top of the for
loop and calls apr_pool_clear (3).  This causes the
proc_mutex_proc_pthread_cleanup function to be invoked on the
accept_mutex.  This results in the mutex being uninitialized (4) and
then unmapped (5).  Because there are other processes operating on this
shared mapping at the time that (4) and (5) occur, the master process
effectively zeros the mutex, and releases the mapping stranding the
forked worker processes with an uninitialized mutex.  Any attempt to
acquire this lock now blocks indefinitely, because the mutex is in an
undefined state.

I can think of a couple possible solutions, but there are probably more
clever ones available too.

 - Re use the accept_mutex across ap_mpm_calls instead of creating a new
   one each time.

 - Reference count the mutex and only call pthread_mutex_destroy when
   the reference count reaches zero.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[Bug 49504] Solaris 10/x64 worker graceful restart problem

Reply via email to