You could try actually looking at the errata list first... 006: RELIABILITY FIX: November 21, 2004 All architectures Fix for transmit side breakage on macppc and mbuf leaks with xl(4).
I wonder what that is for.. On Thu, Jun 09, 2005 at 09:10:17PM -0500, R Ginn wrote: > Hi, > OpenBSD 3.6 (I'm running i386) seems to have a memory leak as regards > to its use of mbufs for network traffic. The default number of > mbuf clusters (kern.maxclusters) is fine until I run a series of > dump commands to a tape drive on a remote system. After the dump > completes, the number of mbufs in use remains high. Each time I > run another dump, the number climbs. Soon I run out of them and > the system locks all ethernet traffic (which hangs all the other > systems depending on this one). Increasing the kern.maxclusters > at this point unlocks the system (although the dump terminates at > that point). > > Fortunately, when it hangs, it spits out a message to indicate > that it ran out of mbuf clusters and to increase kern.maxclusters > BTW, kudos to whoever put that message and suggestion in, it is > a great/necessary feature that is so often missing in products. > > Note that after the dump completes, there are no extra processes > left (the # of processes before I run the rdump = the # of processes > after the rdump completes). > > I checked w/ipcs to see if dump was using any shared memory > but, as expected, it doesn't and there weren't any in use. > > Here is the dump command being used: > > dump 0udbsf 54000 64 96000 [EMAIL PROTECTED]:/dev/nrst0 / > > Before the dump, 40 mbufs and 33 mbuf clusters are in use. > After the dump, 437 mbufs and 146 mbuf clusters are in use. > Before a 2nd dump, 438 mbufs and 148 mbuf clusters are in use. > After a 2nd dump, 4329 mbufs and 1197 mbuf clusters are in use. > Before a 3rd dump, 4330 mbufs and 1199 mbuf clusters are in use. > After a 3rd dump, 8545 mbufs and 2325 mbuf clusters are in use. > > BTW, the first dump here is for "/" and the 2nd dump is > for "/usr" ("/usr" is about 10x bigger than "/"). To > eliminate the case where the issue is just the highwater > mark, the 3rd dump above is an identical dump of "/usr". > > So, since dump (and nothing else extra) is running after the dump > completes, I don't know why the system is "using" more mbufs after > it completes its dump. > > I noticed that a wireless driver had an mbuf leak. So, in case > it's relevant, I am using the xl(4) ethernet driver. > > So, is this a memory/mbuf leak in the kernel? Am I doing something > wrong? Is there anything I can do to "clean up" after each dump? > My current work-around is to set a very large (40,000) maxclusters > value and reboot the system after each set of dumps but that really > rubs me the wrong way -- this is a UNIX(y 8-) system after all ... > > I've provided some traces below. > > Thanks, > Rob Ginn > [EMAIL PROTECTED] > > BEFORE I run an remote dump (but after a reboot) > ================================================ > > Script started on Thu Jun 9 16:14:17 2005 > demo# netstat -m > 40 mbufs in use: > 35 mbufs allocated to data > 1 mbuf allocated to packet headers > 4 mbufs allocated to socket names and addresses > 33/46/40000 mbuf clusters in use (current/peak/max) > 112 Kbytes allocated to network (67% in use) > 0 requests for memory denied > 0 requests for memory delayed > 0 calls to protocol drain routines > demo# ps xa > PID TT STAT TIME COMMAND > 1 ?? Is 0:00.04 /sbin/init > 21191 ?? Is 0:00.03 syslogd: [priv] (syslogd) > 26414 ?? I 0:00.09 syslogd -a /var/empty/dev/log > 30515 ?? Is 0:00.01 pflogd: [priv] (pflogd) > 16850 ?? Is 0:00.01 portmap > 28016 ?? I 0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd) > 10782 ?? I 0:00.05 ypserv > 4580 ?? Is 0:00.30 ypbind > 26954 ?? Is 0:00.01 mountd > 20553 ?? Is 0:00.01 nfsd: master (nfsd) > 11934 ?? IL 0:00.00 nfsd: server (nfsd) > 14637 ?? IL 0:00.40 nfsd: server (nfsd) > 6754 ?? IL 0:00.00 nfsd: server (nfsd) > 15064 ?? IL 0:00.00 nfsd: server (nfsd) > 16771 ?? Is 0:00.00 rpc.lockd > 20629 ?? Is 0:00.07 /usr/sbin/dhcpd xl0 > 3712 ?? Is 0:00.01 lpd > 26612 ?? Is 0:00.02 inetd > 21469 ?? Is 0:00.42 sendmail: accepting connections (sendmail) > 24532 ?? Is 0:00.17 /usr/sbin/sshd > 14769 ?? I 0:00.01 rarpd -a > 25583 ?? Is 0:00.01 rpc.bootparamd > 13440 ?? Is 0:00.01 mopd -a > 10486 ?? Is 0:00.00 /usr/local/adm/bin/rpc.statd > 23664 ?? Is 0:00.04 cron > 27922 p0 Is 0:00.02 -bin/csh -i > 17109 p0 ?+ 0:00.00 ps -xa > 12055 C0 Is 0:00.07 -csh (csh) > 31440 C0 I+ 0:00.01 script BEFORE > 20480 C0 I+ 0:00.01 script BEFORE > 29807 C1 Is+ 0:00.01 /usr/libexec/getty Pc ttyC1 > 5065 C2 Is+ 0:00.01 /usr/libexec/getty Pc ttyC2 > 6641 C3 Is+ 0:00.01 /usr/libexec/getty Pc ttyC3 > 23297 C5 Is+ 0:00.01 /usr/libexec/getty Pc ttyC5 > > > AFTER I run a remote dump > ========================= > > Script started on Thu Jun 9 16:16:12 2005 > demo# netstat -m > 437 mbufs in use: > 232 mbufs allocated to data > 201 mbufs allocated to packet headers > 4 mbufs allocated to socket names and addresses > 146/188/40000 mbuf clusters in use (current/peak/max) > 516 Kbytes allocated to network (77% in use) > 0 requests for memory denied > 0 requests for memory delayed > 0 calls to protocol drain routines > demo# ps xa > PID TT STAT TIME COMMAND > 1 ?? Is 0:00.04 /sbin/init > 21191 ?? Is 0:00.03 syslogd: [priv] (syslogd) > 26414 ?? I 0:00.10 syslogd -a /var/empty/dev/log > 30515 ?? Is 0:00.01 pflogd: [priv] (pflogd) > 16850 ?? Is 0:00.01 portmap > 28016 ?? I 0:00.32 pflogd: [running] -s 116 -f /var/log/pflog (pflogd) > 10782 ?? I 0:00.05 ypserv > 4580 ?? Is 0:00.31 ypbind > 26954 ?? Is 0:00.01 mountd > 20553 ?? Is 0:00.01 nfsd: master (nfsd) > 11934 ?? IL 0:00.00 nfsd: server (nfsd) > 14637 ?? IL 0:00.41 nfsd: server (nfsd) > 6754 ?? IL 0:00.00 nfsd: server (nfsd) > 15064 ?? IL 0:00.00 nfsd: server (nfsd) > 16771 ?? Is 0:00.00 rpc.lockd > 20629 ?? Is 0:00.07 /usr/sbin/dhcpd xl0 > 3712 ?? Is 0:00.01 lpd > 26612 ?? Is 0:00.02 inetd > 21469 ?? Is 0:00.42 sendmail: accepting connections (sendmail) > 24532 ?? Is 0:00.17 /usr/sbin/sshd > 14769 ?? I 0:00.01 rarpd -a > 25583 ?? Is 0:00.01 rpc.bootparamd > 13440 ?? Is 0:00.01 mopd -a > 10486 ?? Is 0:00.00 /usr/local/adm/bin/rpc.statd > 23664 ?? Is 0:00.04 cron > 24790 p0 Is 0:00.02 -bin/csh -i > 6134 p0 ?+ 0:00.00 ps -xa > 12055 C0 Is 0:00.08 -csh (csh) > 13116 C0 I+ 0:00.01 script AFTER > 27577 C0 I+ 0:00.01 script AFTER > 29807 C1 Is+ 0:00.01 /usr/libexec/getty Pc ttyC1 > 5065 C2 Is+ 0:00.01 /usr/libexec/getty Pc ttyC2 > 6641 C3 Is+ 0:00.01 /usr/libexec/getty Pc ttyC3 > 23297 C5 Is+ 0:00.01 /usr/libexec/getty Pc ttyC5