[mdb-discuss] A question about ::vmem debugging by using mdb

Jonathan Adams Mon, 20 Aug 2007 22:32:10 -0400

(Oliver and I had an off-list discussion;  I got his permission to forward
the useful bits back to the list, for the curious and the record.  Not
everyone knows about ::vmem_seg and its useful options.)



On 8/12/07, Oliver Yang <Oliver.Yang at sun.com> wrote:
>
> Hi All,
>
> My DR testing failed after about 6000 of DR loops. And I've got lot's of
> vmem allocation failures during my DR testing, it seems the space of
> "device" is exhausted.
>
> > ::vmem ! grep device
> fffffffecac54000 device                 1073741824   1073741824
> 1326624 109685
>
> I also got lots of warning messages about pcihp and pcicfg, since these
> driver are important to DR.  I think my DR failures were caused by the
> device vmem exhaustion.
>
> Aug 12 12:13:30  pcihp: WARNING: pcihp (pcie_pci3): failed to attach one
> or more drivers for the card in the slot pcie1
> Aug 12 12:13:32 pcicfg: WARNING: pcicfg: cannot map config space, to get
> map type
>
> And I didn't find the memory leaks by ::findleaks dcmd, and psyscial
> memory had 90 free at that time.


::findleaks can't find vmem leaks in anything but the kmem_oversize arena.

Now I have 3 questions about this issue:
>
> 1. How can we know one vmem area used by which kernel modules or
> drivers?  Can we check the vmem allocation info by mdb?


Yes;  if you want the full stack trace of every allocation, you can do (on a
machine with kmem_flags=0xf):

> addr::walk vmem_alloc | ::vmem_seg -v

where addr is the address for the vmem arena (in the output above,
fffffffecac54000)
A first cut at determining who's leaking would be to do:

> addr::walk vmem_alloc | ::vmem_seg -v ! sort | uniq -c | sort -n

The first few will be the allocation function; the ones after that should
point you in the right direction.  To see all segments with a particular
function or function offset in their stack traces, use the '-c' option:

> addr::walk vmem_alloc | ::vmem_seg -v -c func
> addr::walk vmem_alloc | ::vmem_seg -v -c $[func+offset]

(note the use of $[] for computed arguments)

2. The device vmem area seems to be important to vmem allocation
> required by driver attach, does it?


Looks like its used to map device memory:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#1649

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/startup.c#2518

Maybe something isn't cleaning up after itself?  That's 1 gigabyte of VA
that's being used up.
--- cut here ---

Oliver responded:

I found lot's of e1000g driver stack traces by using vmem_seg -v, one of
them like this:

fffffffed8e74910 ALLC ffffff028ea16000 ffffff028ea17000             4096
                     fffffffed0b60500       1ee9c045cd
                vmem_hash_insert+0x8b
                vmem_seg_alloc+0xbd
                vmem_alloc+0x129
                device_arena_alloc+0x27
                rootnex_map_regspec+0x102
                rootnex_map+0x126
                ddi_map+0x4d
                npe_bus_map+0x3a9
                pepb_bus_map+0x31
                ddi_map+0x4d
                ddi_regs_map_setup+0xc4
                pci_config_setup+0x66
                e1000g_attach+0xb1
                devi_attach+0x7f
                attach_node+0x98
                i_ndi_config_node+0x9d
                i_ddi_attachchild+0x3f
                devi_attach_node+0x7f
                devi_config_one+0x2bd
                ndi_devi_config_one+0xb0


Since my driver was already detached, I think it should be a bug.

...

After the some investigations, I found even I did one time driver attach
and detach, I still can find e1000g stack trace above.
But e1000g does call pci_config_teardown in e1000g_detach code, I have
verified it by dtrace and mdb while running one time driver detach.

I think it shouldn't e1000g driver bug, and it might be a ddi or bus
driver's bug.

And I think we'd better file a  bug  against  DDI routine for the
initial evaluation.

---- cut here ---

I agreed that it looked like the driver was doing the right thing.

Oliver, do you have a bugid for this?

Cheers,
- jonathan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/mdb-discuss/attachments/20070820/09ecaf93/attachment.html>

[mdb-discuss] A question about ::vmem debugging by using mdb

Reply via email to