Hi,
OpenBSD 3.6 (I'm running i386) seems to have a memory leak as regards
to its use of mbufs for network traffic. The default number of
mbuf clusters (kern.maxclusters) is fine until I run a series of
dump commands to a tape drive on a remote system. After the dump
completes, the number of mbufs in use remains high. Each time I
run another dump, the number climbs. Soon I run out of them and
the system locks all ethernet traffic (which hangs all the other
systems depending on this one). Increasing the kern.maxclusters
at this point unlocks the system (although the dump terminates at
that point).
Fortunately, when it hangs, it spits out a message to indicate
that it ran out of mbuf clusters and to increase kern.maxclusters
BTW, kudos to whoever put that message and suggestion in, it is
a great/necessary feature that is so often missing in products.
Note that after the dump completes, there are no extra processes
left (the # of processes before I run the rdump = the # of processes
after the rdump completes).
I checked w/ipcs to see if dump was using any shared memory
but, as expected, it doesn't and there weren't any in use.
Here is the dump command being used:
dump 0udbsf 54000 64 96000 [EMAIL PROTECTED]:/dev/nrst0 /
Before the dump, 40 mbufs and 33 mbuf clusters are in use.
After the dump, 437 mbufs and 146 mbuf clusters are in use.
Before a 2nd dump, 438 mbufs and 148 mbuf clusters are in use.
After a 2nd dump, 4329 mbufs and 1197 mbuf clusters are in use.
Before a 3rd dump, 4330 mbufs and 1199 mbuf clusters are in use.
After a 3rd dump, 8545 mbufs and 2325 mbuf clusters are in use.
BTW, the first dump here is for "/" and the 2nd dump is
for "/usr" ("/usr" is about 10x bigger than "/"). To
eliminate the case where the issue is just the highwater
mark, the 3rd dump above is an identical dump of "/usr".
So, since dump (and nothing else extra) is running after the dump
completes, I don't know why the system is "using" more mbufs after
it completes its dump.
I noticed that a wireless driver had an mbuf leak. So, in case
it's relevant, I am using the xl(4) ethernet driver.
So, is this a memory/mbuf leak in the kernel? Am I doing something
wrong? Is there anything I can do to "clean up" after each dump?
My current work-around is to set a very large (40,000) maxclusters
value and reboot the system after each set of dumps but that really
rubs me the wrong way -- this is a UNIX(y 8-) system after all ...
I've provided some traces below.
Thanks,
Rob Ginn
[EMAIL PROTECTED]
BEFORE I run an remote dump (but after a reboot)
================================================
Script started on Thu Jun 9 16:14:17 2005
demo# netstat -m
40 mbufs in use:
35 mbufs allocated to data
1 mbuf allocated to packet headers
4 mbufs allocated to socket names and addresses
33/46/40000 mbuf clusters in use (current/peak/max)
112 Kbytes allocated to network (67% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
demo# ps xa
PID TT STAT TIME COMMAND
1 ?? Is 0:00.04 /sbin/init
21191 ?? Is 0:00.03 syslogd: [priv] (syslogd)
26414 ?? I 0:00.09 syslogd -a /var/empty/dev/log
30515 ?? Is 0:00.01 pflogd: [priv] (pflogd)
16850 ?? Is 0:00.01 portmap
28016 ?? I 0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd)
10782 ?? I 0:00.05 ypserv
4580 ?? Is 0:00.30 ypbind
26954 ?? Is 0:00.01 mountd
20553 ?? Is 0:00.01 nfsd: master (nfsd)
11934 ?? IL 0:00.00 nfsd: server (nfsd)
14637 ?? IL 0:00.40 nfsd: server (nfsd)
6754 ?? IL 0:00.00 nfsd: server (nfsd)
15064 ?? IL 0:00.00 nfsd: server (nfsd)
16771 ?? Is 0:00.00 rpc.lockd
20629 ?? Is 0:00.07 /usr/sbin/dhcpd xl0
3712 ?? Is 0:00.01 lpd
26612 ?? Is 0:00.02 inetd
21469 ?? Is 0:00.42 sendmail: accepting connections (sendmail)
24532 ?? Is 0:00.17 /usr/sbin/sshd
14769 ?? I 0:00.01 rarpd -a
25583 ?? Is 0:00.01 rpc.bootparamd
13440 ?? Is 0:00.01 mopd -a
10486 ?? Is 0:00.00 /usr/local/adm/bin/rpc.statd
23664 ?? Is 0:00.04 cron
27922 p0 Is 0:00.02 -bin/csh -i
17109 p0 ?+ 0:00.00 ps -xa
12055 C0 Is 0:00.07 -csh (csh)
31440 C0 I+ 0:00.01 script BEFORE
20480 C0 I+ 0:00.01 script BEFORE
29807 C1 Is+ 0:00.01 /usr/libexec/getty Pc ttyC1
5065 C2 Is+ 0:00.01 /usr/libexec/getty Pc ttyC2
6641 C3 Is+ 0:00.01 /usr/libexec/getty Pc ttyC3
23297 C5 Is+ 0:00.01 /usr/libexec/getty Pc ttyC5
AFTER I run a remote dump
=========================
Script started on Thu Jun 9 16:16:12 2005
demo# netstat -m
437 mbufs in use:
232 mbufs allocated to data
201 mbufs allocated to packet headers
4 mbufs allocated to socket names and addresses
146/188/40000 mbuf clusters in use (current/peak/max)
516 Kbytes allocated to network (77% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
demo# ps xa
PID TT STAT TIME COMMAND
1 ?? Is 0:00.04 /sbin/init
21191 ?? Is 0:00.03 syslogd: [priv] (syslogd)
26414 ?? I 0:00.10 syslogd -a /var/empty/dev/log
30515 ?? Is 0:00.01 pflogd: [priv] (pflogd)
16850 ?? Is 0:00.01 portmap
28016 ?? I 0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd)
10782 ?? I 0:00.05 ypserv
4580 ?? Is 0:00.31 ypbind
26954 ?? Is 0:00.01 mountd
20553 ?? Is 0:00.01 nfsd: master (nfsd)
11934 ?? IL 0:00.00 nfsd: server (nfsd)
14637 ?? IL 0:00.41 nfsd: server (nfsd)
6754 ?? IL 0:00.00 nfsd: server (nfsd)
15064 ?? IL 0:00.00 nfsd: server (nfsd)
16771 ?? Is 0:00.00 rpc.lockd
20629 ?? Is 0:00.07 /usr/sbin/dhcpd xl0
3712 ?? Is 0:00.01 lpd
26612 ?? Is 0:00.02 inetd
21469 ?? Is 0:00.42 sendmail: accepting connections (sendmail)
24532 ?? Is 0:00.17 /usr/sbin/sshd
14769 ?? I 0:00.01 rarpd -a
25583 ?? Is 0:00.01 rpc.bootparamd
13440 ?? Is 0:00.01 mopd -a
10486 ?? Is 0:00.00 /usr/local/adm/bin/rpc.statd
23664 ?? Is 0:00.04 cron
24790 p0 Is 0:00.02 -bin/csh -i
6134 p0 ?+ 0:00.00 ps -xa
12055 C0 Is 0:00.08 -csh (csh)
13116 C0 I+ 0:00.01 script AFTER
27577 C0 I+ 0:00.01 script AFTER
29807 C1 Is+ 0:00.01 /usr/libexec/getty Pc ttyC1
5065 C2 Is+ 0:00.01 /usr/libexec/getty Pc ttyC2
6641 C3 Is+ 0:00.01 /usr/libexec/getty Pc ttyC3
23297 C5 Is+ 0:00.01 /usr/libexec/getty Pc ttyC5