On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote:
> On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote:
> > I just spent a little over a day debugging a stack overflow problem in
> > mdb itself.  It turned out to be a fairly simple problem -- I'd added
> > a new dcmd, and one of the functions had a structure on the stack that
> > turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of
> > the problem were fairly misleading and unexpected.  I saw panics that
> > looked like this:
> >
> > kmdb ABORT: "../common/umem.c", line 1264: assertion failed: sp->slab_cache 
> > == cp
> > Debugger aborted
> > Program terminated
> > {2} ok boot
> >
> > It turns out that allocating big things on the stack inside mdb can be
> > somewhat toxic.
> >
> > I fixed my problem by allocating the offending structure with
> > mdb_alloc, but that begs a question: are there other instances of this
> > problem hiding in here?  Could this be near the root of weird problems
> > like CR 6766866?
> >
> > It seems to me that the compiler must (obviously) know how much
> > storage it's reserving for auto variables.  Is there any way to find
> > this out and enforce a limit?  That wouldn't fix the problem of
> > nesting too deeply (or just recursing), but it'd at least catch
> > obvious blunders before they turn into lengthy trials.
> >
> 
> iirc, at some point, someone had a tool which could look at a kernel
> panic stack trace and tell you the stack usage of each frame.  but that
> is only for post mortem analysis.  afaik we don't have any way of
> getting this information from the compiler, and we don't have any tools
> that can do assembly analysis to determine this information.

It's pretty easy to look for subtractions from %rsp in the code:

dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | 
    sed 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' |
    while read func off; do printf "%5d %s\n" "$off" "$func";
    done | sort -n | egrep -v ' walkers\+| dcmds+"

gives:

...
 4392 threadlist+0x21
 6592 pfile_callback+0x1d
 7336 vmem+0x1d
 8504 calloutid+0x21

So calloutid is the largest stack user.  Something similar will work
for sparc.  We could put something like this in the kmdb module build,
and have a "max allowed" value (10k?)

It wouldn't solve heavily recursive stuff, but maybe a guard page or two could
protect against that.

Cheers,
- jonathan



Reply via email to