On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote: > On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote: > > I just spent a little over a day debugging a stack overflow problem in > > mdb itself. It turned out to be a fairly simple problem -- I'd added > > a new dcmd, and one of the functions had a structure on the stack that > > turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of > > the problem were fairly misleading and unexpected. I saw panics that > > looked like this: > > > > kmdb ABORT: "../common/umem.c", line 1264: assertion failed: sp->slab_cache > > == cp > > Debugger aborted > > Program terminated > > {2} ok boot > > > > It turns out that allocating big things on the stack inside mdb can be > > somewhat toxic. > > > > I fixed my problem by allocating the offending structure with > > mdb_alloc, but that begs a question: are there other instances of this > > problem hiding in here? Could this be near the root of weird problems > > like CR 6766866? > > > > It seems to me that the compiler must (obviously) know how much > > storage it's reserving for auto variables. Is there any way to find > > this out and enforce a limit? That wouldn't fix the problem of > > nesting too deeply (or just recursing), but it'd at least catch > > obvious blunders before they turn into lengthy trials. > > > > iirc, at some point, someone had a tool which could look at a kernel > panic stack trace and tell you the stack usage of each frame. but that > is only for post mortem analysis. afaik we don't have any way of > getting this information from the compiler, and we don't have any tools > that can do assembly analysis to determine this information.
It's pretty easy to look for subtractions from %rsp in the code: dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | sed 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' | while read func off; do printf "%5d %s\n" "$off" "$func"; done | sort -n | egrep -v ' walkers\+| dcmds+" gives: ... 4392 threadlist+0x21 6592 pfile_callback+0x1d 7336 vmem+0x1d 8504 calloutid+0x21 So calloutid is the largest stack user. Something similar will work for sparc. We could put something like this in the kmdb module build, and have a "max allowed" value (10k?) It wouldn't solve heavily recursive stuff, but maybe a guard page or two could protect against that. Cheers, - jonathan