[mdb-discuss] stack overflow within mdb itself

Alexandre Chartre Thu, 26 Feb 2009 12:22:10 -0800

  If I remember correctly, kmdb uses krs_cpustack from kaif_cpusave_t
structure as a stack (one for each cpu). krs_cpustack is just an array
at the end of the structure, and all structures (one per cpu) are
allocated together.


  So there's no redzone protection in that case, if the kmdb stack overflows
it will actually trash other information in kaif_cpusave_t leading to
very bad and random behavior (there's a recovery sequence, kmdb_fault, but
it can fail if the kaif_cpusave_t structure is corrupted).

alex.


On 02/26/09 12:08, Jonathan Adams wrote:
> On Thu, Feb 26, 2009 at 09:07:00PM +0100, max at bruningsystems.com wrote:
>> Jonathan Adams wrote:
>>> On Thu, Feb 26, 2009 at 11:42:59AM -0800, Edward Pilatowicz wrote:
>>>  
>>>> On Thu, Feb 26, 2009 at 01:42:24PM -0500, James Carlson wrote:
>>>>    
>>>>> I just spent a little over a day debugging a stack overflow problem in
>>>>> mdb itself.  It turned out to be a fairly simple problem -- I'd added
>>>>> a new dcmd, and one of the functions had a structure on the stack that
>>>>> turned out to be unexpectedly _huge_ (512K+) -- but the symptoms of
>>>>> the problem were fairly misleading and unexpected.  I saw panics that
>>>>> looked like this:
>>>>>
>>>>> kmdb ABORT: "../common/umem.c", line 1264: assertion failed: 
>>>>> sp->slab_cache == cp
>>>>> Debugger aborted
>>>>> Program terminated
>>>>> {2} ok boot
>>>>>
>>>>> It turns out that allocating big things on the stack inside mdb can be
>>>>> somewhat toxic.
>>>>>
>>>>> I fixed my problem by allocating the offending structure with
>>>>> mdb_alloc, but that begs a question: are there other instances of this
>>>>> problem hiding in here?  Could this be near the root of weird problems
>>>>> like CR 6766866?
>>>>>
>>>>> It seems to me that the compiler must (obviously) know how much
>>>>> storage it's reserving for auto variables.  Is there any way to find
>>>>> this out and enforce a limit?  That wouldn't fix the problem of
>>>>> nesting too deeply (or just recursing), but it'd at least catch
>>>>> obvious blunders before they turn into lengthy trials.
>>>>>
>>>>>      
>>>> iirc, at some point, someone had a tool which could look at a kernel
>>>> panic stack trace and tell you the stack usage of each frame.  but that
>>>> is only for post mortem analysis.  afaik we don't have any way of
>>>> getting this information from the compiler, and we don't have any tools
>>>> that can do assembly analysis to determine this information.
>>>>    
>>> It's pretty easy to look for subtractions from %rsp in the code:
>>>
>>> dis /kernel/kmdb/amd64/genunix | grep 'subq.*rsp' | 
>>>    sed 's/\(.*\):.*subq.*\$\(0x[0-9a-f]*\),%rsp.*$/\1 \2' |
>>>    while read func off; do printf "%5d %s\n" "$off" "$func";
>>>    done | sort -n | egrep -v ' walkers\+| dcmds+"
>>>
>>> gives:
>>>
>>> ...
>>> 4392 threadlist+0x21
>>> 6592 pfile_callback+0x1d
>>> 7336 vmem+0x1d
>>> 8504 calloutid+0x21
>>>
>>> So calloutid is the largest stack user.  Something similar will work
>>> for sparc.  We could put something like this in the kmdb module build,
>>> and have a "max allowed" value (10k?)
>>>
>>> It wouldn't solve heavily recursive stuff, but maybe a guard page or two 
>>> could
>>> protect against that.
>>>  
>> There is a redzone page at the low end of all(?) kernel stacks.  When it 
>> is touched,
>> the system panics (better than overwriting other memory).
>> See the code for segkp_fault(0 in uts/common/vm/seg_kp.c.
> 
> But kmdb has it's own stacks.  I don't know if they have redzone pages.
> 
> Cheers,
> - jonathan
> 
> _______________________________________________
> mdb-discuss mailing list
> mdb-discuss at opensolaris.org

[mdb-discuss] stack overflow within mdb itself

Reply via email to