Re: segfault with worker, ap_lingering_close

Cliff Woolley Mon, 12 Nov 2001 08:57:45 -0800


[Moving this back to the list.]

On Mon, 12 Nov 2001, Aaron Bannert wrote:

> So how'd you get the trace? I completely avoid multithreaded programming
> on linux for this very reason.

I think I've started to get the hang of this.  It's not as bad as I
thought.  What I did was attach gdb to the pid of the process (as opposed
to the pid of an individual thread... the trick is figuring out which is
which).  I then ran the test that was failing, and sure enough, gdb
trapped my SEGV, took me to the thread that faulted, and I got a backtrace
no problemo.  <shrug>

> > Program received signal SIGSEGV, Segmentation fault.
> > 0x4003bfe0 in apr_pool_clear (a=0x8183ed4) at apr_pools.c:957
> > 957         free_blocks(a->first->h.next);
> > (gdb) bt
> > #0  0x4003bfe0 in apr_pool_clear (a=0x8183ed4) at apr_pools.c:957
> > #1  0x80bee97 in core_output_filter (f=0x817a214, b=0x0) at core.c:3217
> > #2  0x80b8b65 in ap_pass_brigade (next=0x817a214, bb=0x817a264)
> >     at util_filter.c:276
> > #3  0x80b77ac in ap_flush_conn (c=0x8179f84) at connection.c:138
> > #4  0x80b7805 in ap_lingering_close (dummy=0x8179f84) at connection.c:175
> > #5  0x4003be2a in run_cleanups (c=0x817a244) at apr_pools.c:833
> > #6  0x4003bfbf in apr_pool_clear (a=0x8179e84) at apr_pools.c:949
> > #7  0x4003c02c in apr_pool_destroy (a=0x8179e84) at apr_pools.c:995
> > #8  0x80ad9dd in worker_thread (thd=0x815273c, dummy=0x81d9ad8) at
> > worker.c:723
> > #9  0x40036cbe in dummy_worker (opaque=0x815273c) at thread.c:122
> > #10 0x401d9065 in pthread_start_thread (arg=0xbf3ffc00) at manager.c:274
>
> Also, do we have any idea what's going on here? worker_thread() is only
> going to destroy the pool when it is time to quit, and no sooner. That
> means that w_m_e has turned to 1 and these guys are popping out of
> the listen queue.

Naw.  The pool that's being destroyed is ptrans, at line 723 of worker
(inside the while (!w_m_e) loop).  When ptrans gets destroyed, it triggers
Ryan's ap_lingering_close c->pool cleanup, which sends a flush bucket down
the filter chain.  It gets weird in the core_output_filter, where we have
a brigade with just a flush bucket in it.  That means we drop through to
the else case on line 3191 of core.c, call writev_it_all() and destroy the
brigade with the flush bucket in it.  But then somehow ctx->subpool is
non-null and ctx->subpool_has_stuff is true, so we try to clear
ctx->subpool; I'm guessing that where the problem is, since that pool
probably already got cleared during the c->pool cleanup.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   [EMAIL PROTECTED]
   Charlottesville, VA

Re: segfault with worker, ap_lingering_close

Reply via email to