On 09 Aug 2001 08:37:23 -0400, Jeff Trawick wrote:
> It is frustrating to try to guess what worker threads are doing at a
> given instance. You can get a bit of info out of the ps output (at
> least on some platforms) but I'm generally left making a lot of
> assumptions (and am often wrong :) ), so I wrote some code to tell me
> exactly what the worker threads are doing.
>
my 2c
would it be worthwile adding these extra states to the scoreboard?
and then you could use something like mod-status to see what is going on
(maybe have a newcommand to enable them like ExtendedStatus does)
I'd like to see this patch as a seperate module. so if I include it then
a new process will be spawned an I can get status like info out of it.
BTW there is a hook (get_mgmt_items) which each of the MPM's could
implement to dump MPM specific info to mod-status
..Ian
> I'm curious if anyone else would find this useful. The concept could
> be applied to other MPMs as well; it is probably most useful for
> threaded MPMs, where it is problematic to find out what the workers
> are doing at a particular time.
>
> When compiled with MPM_DEBUG:
>
> At startup, a debug process is forked off; this debug process will
> report the state of all server processes/threads if you telnet to
> the right port; the debug processes switches to a different process
> group at startup so that if you want to freeze Apache and all of its
> threads ("kill -STOP -`cat myprefix/logs/httpd.pid`") it will still
> be around to tell you the state of things
>
> Worker threads set a footprint in the worker_score which indicates
> what sort of processing they are doing; for debugging an MPM, it is
> important to know if they are blocked in mutex, accept, poll,
> processing a connection, doing lingering close, and a few other
> tasks.
>
> When not compiled with MPM_DEBUG:
>
> no change
>
> Here is a sample output from the debug process:
>
> $ telnet 0.0.0.0 10101
> Trying 0.0.0.0...
> Connected to 0.0.0.0.
> Escape character is '^]'.
> server process 0, pid 29859, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 1, pid 30078, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 2, pid 30102, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> server process 3, pid 29931, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 4, pid 29954, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 5, pid 29977, generation 0, not quiescing
> worker 0: processing connection
> worker 1: processing connection
> worker 2: processing connection
> worker 3: processing connection
> worker 4: processing connection
> worker 5: processing connection
> worker 6: processing connection
> worker 7: processing connection
> worker 8: processing connection
> worker 9: processing connection
> worker 10: processing connection
> worker 11: processing connection
> worker 12: processing connection
> worker 13: processing connection
> worker 14: processing connection
> worker 15: processing connection
> worker 16: processing connection
> worker 17: processing connection
> worker 18: processing connection
> worker 19: processing connection
> server process 6, pid 30001, generation 0, quiescing
> worker 0: exiting thread
> worker 1: exiting thread
> worker 2: exiting thread
> worker 3: exiting thread
> worker 4: exiting thread
> worker 5: exiting thread
> worker 6: exiting thread
> worker 7: exiting thread
> worker 8: exiting thread
> worker 9: exiting thread
> worker 10: exiting thread
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> server process 7, pid 30024, generation 0, quiescing
> worker 0: exiting thread
> worker 1: exiting thread
> worker 2: exiting thread
> worker 3: exiting thread
> worker 4: exiting thread
> worker 5: exiting thread
> worker 6: exiting thread
> worker 7: exiting thread
> worker 8: exiting thread
> worker 9: exiting thread
> worker 10: exiting thread
> worker 11: exiting thread
> worker 12: exiting thread
> worker 13: exiting thread
> worker 14: exiting thread
> worker 15: exiting thread
> worker 16: exiting thread
> worker 17: exiting thread
> worker 18: exiting thread
> worker 19: exiting thread
> Connection closed by foreign host.
>
> This is not the most interesting scenario imaginable, but it shows the
> general idea.
>
> It would be nice to represent the information in a more concise
> manner.
>
> The debug process does not exit with Apache. Sometimes this is good
> (you can still retrieve the info in case Apache bailed out improperly
> and/or you want to verify that all the worker threads went down
> cleanly) but sometimes it is just a nuisance (you have to kill it
> manually before restarting Apache).
>
> Another idea would be to let the debug process handle a normal HTTP
> request on the special port. Then you could send a request for
> mod_status output there. Unfortunately, AFAIK there is no current way
> for mod_status to let the MPM format MPM-specific information for the
> report. In fact there doesn't seem to be a way for MPMs to decide
> what goes in the scoreboard.
>
> Index: include/scoreboard.h
> ===================================================================
> RCS file: /home/cvspublic/httpd-2.0/include/scoreboard.h,v
> retrieving revision 1.30
> diff -u -r1.30 scoreboard.h
> --- include/scoreboard.h 2001/08/01 06:23:00 1.30
> +++ include/scoreboard.h 2001/08/09 12:21:43
> @@ -164,6 +164,9 @@
> server_rec *vhostrec; /* What virtual host is being accessed? */
> /* SEE ABOVE FOR SAFE USAGE! */
> worker_score *next;
> +#ifdef MPM_DEBUG
> + int footprint;
> +#endif
> };
>
> typedef struct {
> Index: server/mpm/threaded/threaded.c
> ===================================================================
> RCS file: /home/cvspublic/httpd-2.0/server/mpm/threaded/threaded.c,v
> retrieving revision 1.55
> diff -u -r1.55 threaded.c
> --- server/mpm/threaded/threaded.c 2001/08/05 18:08:49 1.55
> +++ server/mpm/threaded/threaded.c 2001/08/09 12:21:47
> @@ -134,6 +134,120 @@
> apr_threadattr_t *threadattr;
> } thread_starter;
>
> +#ifdef MPM_DEBUG
> +#define SETWFP(_ws_, _val_) \
> +(_ws_)->footprint = (_val_)
> +enum WFP_vals
> +{
> + WFP_ENTER_LOOP = 1,
> + WFP_WAIT_ACCEPTMUTEX,
> + WFP_WAIT_POLL,
> + WFP_CHECK_POD,
> + WFP_ENTER_ACCEPT,
> + WFP_RELEASE_ACCEPTMUTEX,
> + WFP_PROCESS_SOCKET,
> + WFP_LINGERING_CLOSE,
> + WFP_SOCKET_PROCESSED,
> + WFP_CLEAR_PTRANS,
> + WFP_DESTROY_TPOOL,
> + WFP_EXIT_THREAD
> +};
> +
> +static const char *WFP_strings[] =
> +{
> + "BAD FOOTPRINT",
> + "entering main loop",
> + "waiting for accept mutex",
> + "polling",
> + "checking pod",
> + "in accept",
> + "releasing accept mutex",
> + "processing connection",
> + "lingering close",
> + "after processing connection",
> + "clearing ptrans pool",
> + "destroying tpool pool",
> + "exiting thread"
> +};
> +
> +#define FATAL(fncall) \
> +{ \
> + apr_status_t rv; \
> + rv = fncall; \
> + if (rv != APR_SUCCESS) { \
> + ap_log_error(APLOG_MARK, APLOG_ALERT, \
> + rv, ap_server_conf, \
> + #fncall); \
> + exit(1); \
> + } \
> +}
> +
> +#define WARN(rv, fncall) \
> +{ \
> + rv = fncall; \
> + if (rv != APR_SUCCESS) { \
> + ap_log_error(APLOG_MARK, APLOG_WARNING, \
> + rv, ap_server_conf, \
> + #fncall); \
> + } \
> +}
> +
> +static void start_debug_process(void)
> +{
> + if (!fork()) {
> + /* debugger process */
> + apr_pool_t *p;
> + pid_t mypid = getpid();
> + apr_socket_t *sock, *connection;
> + apr_sockaddr_t *sa;
> + int i, j;
> + process_score *ps;
> + worker_score *ws;
> + char buf[2048];
> + apr_size_t len;
> + apr_status_t rv;
> +
> + /* change process groups so that we're unaffected if Apache is
> + * frozen via SIGSTOP
> + */
> + setpgid(mypid, 0);
> +
> + apr_pool_create(&p, NULL);
> + FATAL(apr_sockaddr_info_get(&sa, "127.0.0.1", APR_UNSPEC, 10101, 0, p));
> + FATAL(apr_socket_create(&sock, sa->family, SOCK_STREAM, p));
> + FATAL(apr_setsocketopt(sock, APR_SO_REUSEADDR, 1));
> + FATAL(apr_bind(sock, sa));
> + FATAL(apr_listen(sock, 5));
> + while (1) {
> + WARN(rv, apr_accept(&connection, sock, p));
> + if (rv != APR_SUCCESS)
> + continue;
> + for (i = 0; i < HARD_SERVER_LIMIT; i++) {
> + ps = &ap_scoreboard_image->parent[i];
> + len = apr_snprintf(buf, sizeof buf,
> + "server process %d, pid %" APR_OS_PROC_T_FMT ",
>generation %d, %s\n",
> + i, ps->pid, ps->generation,
> + ps->quiescing ? "quiescing" : "not quiescing");
> + WARN(rv, apr_send(connection, buf, &len));
> + for (j = 0; j < HARD_THREAD_LIMIT; j++) {
> + ws = &ap_scoreboard_image->servers[i][j];
> + if (ws->footprint != 0) { /* if ever initialized */
> + len = apr_snprintf(buf, sizeof buf,
> + "worker %d: %s\n",
> + j, WFP_strings[ws->footprint]);
> + WARN(rv, apr_send(connection, buf, &len));
> + }
> + }
> + }
> + apr_socket_close(connection);
> + }
> + }
> +}
> +
> +#else /* MPM_DEBUG */
> +#define SETWFP(val)
> +#endif /* MPM_DEBUG */
> +
> /*
> * The max child slot ever assigned, preserved across restarts. Necessary
> * to deal with MaxClients changes across SIGWINCH restarts. We use this
> @@ -452,6 +566,9 @@
> conn_rec *current_conn;
> long conn_id = AP_ID_FROM_CHILD_THREAD(my_child_num, my_thread_num);
> int csd;
> +#ifdef MPM_DEBUG
> + worker_score *ws = &ap_scoreboard_image->servers[my_child_num][my_thread_num];
> +#endif
>
> (void) apr_os_sock_get(&csd, sock);
>
> @@ -470,6 +587,7 @@
> current_conn = ap_new_connection(p, ap_server_conf, sock, conn_id);
> if (current_conn) {
> ap_process_connection(current_conn);
> + SETWFP(ws, WFP_LINGERING_CLOSE);
> ap_lingering_close(current_conn);
> }
> }
> @@ -535,6 +653,9 @@
> int curr_pollfd, last_pollfd = 0;
> apr_pollfd_t *pollset;
> apr_status_t rv;
> +#ifdef MPM_DEBUG
> + worker_score *ws = &ap_scoreboard_image->servers[process_slot][thread_slot];
> +#endif
>
> free(ti);
>
> @@ -548,6 +669,8 @@
> for(n=0 ; n <= num_listensocks ; ++n)
> apr_poll_socket_add(pollset, listensocks[n], APR_POLLIN);
>
> + SETWFP(ws, WFP_ENTER_LOOP);
> +
> /* TODO: Switch to a system where threads reuse the results from earlier
> poll calls - manoj */
> while (1) {
> @@ -558,6 +681,7 @@
>
> (void) ap_update_child_status(process_slot, thread_slot, SERVER_READY,
> (request_rec *) NULL);
> + SETWFP(ws, WFP_WAIT_ACCEPTMUTEX);
> if ((rv = SAFE_ACCEPT(apr_lock_acquire(accept_mutex)))
> != APR_SUCCESS) {
> ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
> @@ -570,6 +694,7 @@
> apr_status_t ret;
> apr_int16_t event;
>
> + SETWFP(ws, WFP_WAIT_POLL);
> ret = apr_poll(pollset, &n, -1);
> if (ret != APR_SUCCESS) {
> if (APR_STATUS_IS_EINTR(ret)) {
> @@ -589,6 +714,7 @@
> if (event & APR_POLLIN) {
> /* A process got a signal on the shutdown pipe. Check if we're
> * the lucky process to die. */
> + SETWFP(ws, WFP_CHECK_POD);
> check_pipe_of_death();
> continue;
> }
> @@ -617,11 +743,13 @@
> }
> got_fd:
> if (!workers_may_exit) {
> + SETWFP(ws, WFP_ENTER_ACCEPT);
> if ((rv = apr_accept(&csd, sd, ptrans)) != APR_SUCCESS) {
> csd = NULL;
> ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf,
> "apr_accept");
> }
> + SETWFP(ws, WFP_RELEASE_ACCEPTMUTEX);
> if ((rv = SAFE_ACCEPT(apr_lock_release(accept_mutex)))
> != APR_SUCCESS) {
> ap_log_error(APLOG_MARK, APLOG_EMERG, rv, ap_server_conf,
> @@ -630,7 +758,9 @@
> workers_may_exit = 1;
> }
> if (csd != NULL) {
> + SETWFP(ws, WFP_PROCESS_SOCKET);
> process_socket(ptrans, csd, process_slot, thread_slot);
> + SETWFP(ws, WFP_SOCKET_PROCESSED);
> requests_this_child--;
> }
> }
> @@ -644,9 +774,11 @@
> }
> break;
> }
> + SETWFP(ws, WFP_CLEAR_PTRANS);
> apr_pool_clear(ptrans);
> }
>
> + SETWFP(ws, WFP_DESTROY_TPOOL);
> apr_pool_destroy(tpool);
> ap_update_child_status(process_slot, thread_slot, (dying) ? SERVER_DEAD :
>SERVER_GRACEFUL,
> (request_rec *) NULL);
> @@ -667,6 +799,8 @@
> }
> apr_lock_release(worker_thread_count_mutex);
>
> + SETWFP(ws, WFP_EXIT_THREAD);
> +
> return NULL;
> }
>
> @@ -1223,6 +1357,10 @@
> if (!is_graceful) {
> ap_run_pre_mpm(pconf, SB_SHARED);
> }
> +
> +#ifdef MPM_DEBUG
> + start_debug_process();
> +#endif
>
> set_signals();
> /* Don't thrash... */
>
> --
> Jeff Trawick | [EMAIL PROTECTED] | PGP public key at web site:
> http://www.geocities.com/SiliconValley/Park/9289/
> Born in Roswell... married an alien...
--
Ian Holsman
Performance Measurement & Analysis
CNET Networks - 415 364-8608