mbuf leak in OpenBSD 3.6?

R Ginn Thu, 09 Jun 2005 17:09:50 -0700

Hi,
OpenBSD 3.6 (I'm running i386) seems to have a memory leak as regards
to its use of mbufs for network traffic.  The default number of
mbuf clusters (kern.maxclusters) is fine until I run a series of
dump commands to a tape drive on a remote system.  After the dump
completes, the number of mbufs in use remains high.  Each time I
run another dump, the number climbs.  Soon I run out of them and
the system locks all ethernet traffic (which hangs all the other
systems depending on this one).  Increasing the kern.maxclusters
at this point unlocks the system (although the dump terminates at
that point).


Fortunately, when it hangs, it spits out a message to indicate
that it ran out of mbuf clusters and to increase kern.maxclusters
BTW, kudos to whoever put that message and suggestion in, it is
a great/necessary feature that is so often missing in products.

Note that after the dump completes, there are no extra processes
left (the # of processes before I run the rdump = the # of processes
after the rdump completes).

I checked w/ipcs to see if dump was using any shared memory
but, as expected, it doesn't and there weren't any in use.

Here is the dump command being used:

  dump 0udbsf 54000 64 96000 [EMAIL PROTECTED]:/dev/nrst0 /

Before the dump, 40 mbufs and 33 mbuf clusters are in use.
After the dump, 437 mbufs and 146 mbuf clusters are in use.
Before a 2nd dump, 438 mbufs and 148 mbuf clusters are in use.
After a 2nd dump, 4329 mbufs and 1197 mbuf clusters are in use.
Before a 3rd dump, 4330 mbufs and 1199 mbuf clusters are in use.
After a 3rd dump, 8545 mbufs and 2325 mbuf clusters are in use.

BTW, the first dump here is for "/" and the 2nd dump is
for "/usr" ("/usr" is about 10x bigger than "/").  To
eliminate the case where the issue is just the highwater
mark, the 3rd dump above is an identical dump of "/usr".

So, since dump (and nothing else extra) is running after the dump
completes, I don't know why the system is "using" more mbufs after
it completes its dump.

I noticed that a wireless driver had an mbuf leak.  So, in case
it's relevant, I am using the xl(4) ethernet driver.

So, is this a memory/mbuf leak in the kernel?  Am I doing something
wrong?  Is there anything I can do to "clean up" after each dump?
My current work-around is to set a very large (40,000) maxclusters
value and reboot the system after each set of dumps but that really
rubs me the wrong way -- this is a UNIX(y 8-) system after all ...

I've provided some traces below.

Thanks,
Rob Ginn
[EMAIL PROTECTED]

BEFORE I run an remote dump (but after a reboot)
================================================

Script started on Thu Jun  9 16:14:17 2005
demo# netstat -m
40 mbufs in use:
        35 mbufs allocated to data
        1 mbuf allocated to packet headers
        4 mbufs allocated to socket names and addresses
33/46/40000 mbuf clusters in use (current/peak/max)
112 Kbytes allocated to network (67% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
demo# ps xa
  PID TT   STAT      TIME COMMAND
    1 ??  Is      0:00.04 /sbin/init 
21191 ??  Is      0:00.03 syslogd: [priv] (syslogd)
26414 ??  I       0:00.09 syslogd -a /var/empty/dev/log 
30515 ??  Is      0:00.01 pflogd: [priv] (pflogd)
16850 ??  Is      0:00.01 portmap 
28016 ??  I       0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd)
10782 ??  I       0:00.05 ypserv 
 4580 ??  Is      0:00.30 ypbind 
26954 ??  Is      0:00.01 mountd 
20553 ??  Is      0:00.01 nfsd: master (nfsd)
11934 ??  IL      0:00.00 nfsd: server (nfsd)
14637 ??  IL      0:00.40 nfsd: server (nfsd)
 6754 ??  IL      0:00.00 nfsd: server (nfsd)
15064 ??  IL      0:00.00 nfsd: server (nfsd)
16771 ??  Is      0:00.00 rpc.lockd 
20629 ??  Is      0:00.07 /usr/sbin/dhcpd xl0 
 3712 ??  Is      0:00.01 lpd 
26612 ??  Is      0:00.02 inetd 
21469 ??  Is      0:00.42 sendmail: accepting connections (sendmail)
24532 ??  Is      0:00.17 /usr/sbin/sshd 
14769 ??  I       0:00.01 rarpd -a 
25583 ??  Is      0:00.01 rpc.bootparamd 
13440 ??  Is      0:00.01 mopd -a 
10486 ??  Is      0:00.00 /usr/local/adm/bin/rpc.statd 
23664 ??  Is      0:00.04 cron 
27922 p0  Is      0:00.02 -bin/csh -i 
17109 p0  ?+      0:00.00 ps -xa 
12055 C0  Is      0:00.07 -csh (csh)
31440 C0  I+      0:00.01 script BEFORE 
20480 C0  I+      0:00.01 script BEFORE 
29807 C1  Is+     0:00.01 /usr/libexec/getty Pc ttyC1 
 5065 C2  Is+     0:00.01 /usr/libexec/getty Pc ttyC2 
 6641 C3  Is+     0:00.01 /usr/libexec/getty Pc ttyC3 
23297 C5  Is+     0:00.01 /usr/libexec/getty Pc ttyC5 


AFTER I run a remote dump
=========================

Script started on Thu Jun  9 16:16:12 2005
demo# netstat -m
437 mbufs in use:
        232 mbufs allocated to data
        201 mbufs allocated to packet headers
        4 mbufs allocated to socket names and addresses
146/188/40000 mbuf clusters in use (current/peak/max)
516 Kbytes allocated to network (77% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
demo# ps xa
  PID TT   STAT      TIME COMMAND
    1 ??  Is      0:00.04 /sbin/init 
21191 ??  Is      0:00.03 syslogd: [priv] (syslogd)
26414 ??  I       0:00.10 syslogd -a /var/empty/dev/log 
30515 ??  Is      0:00.01 pflogd: [priv] (pflogd)
16850 ??  Is      0:00.01 portmap 
28016 ??  I       0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd)
10782 ??  I       0:00.05 ypserv 
 4580 ??  Is      0:00.31 ypbind 
26954 ??  Is      0:00.01 mountd 
20553 ??  Is      0:00.01 nfsd: master (nfsd)
11934 ??  IL      0:00.00 nfsd: server (nfsd)
14637 ??  IL      0:00.41 nfsd: server (nfsd)
 6754 ??  IL      0:00.00 nfsd: server (nfsd)
15064 ??  IL      0:00.00 nfsd: server (nfsd)
16771 ??  Is      0:00.00 rpc.lockd 
20629 ??  Is      0:00.07 /usr/sbin/dhcpd xl0 
 3712 ??  Is      0:00.01 lpd 
26612 ??  Is      0:00.02 inetd 
21469 ??  Is      0:00.42 sendmail: accepting connections (sendmail)
24532 ??  Is      0:00.17 /usr/sbin/sshd 
14769 ??  I       0:00.01 rarpd -a 
25583 ??  Is      0:00.01 rpc.bootparamd 
13440 ??  Is      0:00.01 mopd -a 
10486 ??  Is      0:00.00 /usr/local/adm/bin/rpc.statd 
23664 ??  Is      0:00.04 cron 
24790 p0  Is      0:00.02 -bin/csh -i 
 6134 p0  ?+      0:00.00 ps -xa 
12055 C0  Is      0:00.08 -csh (csh)
13116 C0  I+      0:00.01 script AFTER 
27577 C0  I+      0:00.01 script AFTER 
29807 C1  Is+     0:00.01 /usr/libexec/getty Pc ttyC1 
 5065 C2  Is+     0:00.01 /usr/libexec/getty Pc ttyC2 
 6641 C3  Is+     0:00.01 /usr/libexec/getty Pc ttyC3 
23297 C5  Is+     0:00.01 /usr/libexec/getty Pc ttyC5

mbuf leak in OpenBSD 3.6?

Reply via email to