Hi misc,
I'm having frequent crashes on OpenBSD 4.2 (stable) on different
machines with the following error:
panic: pmap_pinit: kernel_map out of virtual space!
Specifically, we have two carped firewalls (running pfsync) that showed
the same error with a difference of around 8 hours. First the backup
crashed, and then master.
I could run "boot dump" on the first one that crashed (the backup box),
and then recover the core files with savecore (bsd.0 and bsd.0.core).
But now, when trying to run the gdb commands, I get to a point where it
tells me this when typing "target kvm bsd.0.core" and hitting enter:
myhost:/var/crash# gdb
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-openbsd4.2".
(gdb) file bsd.0
Reading symbols from /u/data/crash/bsd.0...(no debugging symbols
found)...done.
(gdb) target kvm bsd.0.core
Cannot access memory at address 0xffbe6afc
(gdb)
Why could this be? I'm kind of stuck at this point. I could run vmstat
and ps commands with the -N and -M options, but I don't think I'm
getting something very useful. I did see this with vmstat -m though:
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
16 3918 1714 228317130 1280 71
32 321 447 17128808 640 0
64 1222 1018 13797629 320 295030
128 405 43 4699894 160 0
256 229 59 14697840 80 71663
512 447 25 2447129 40 5629
1024 1274 30 941406 20 419326
2048 17 17 2263518 10 1768442
4096 21 6 1920222 5 0
8192 12 0 12 5 0
16384 2 0 4615 5 0
32768 4 0 8 5 0
Memory usage type by bucket size
Size Type(s)
16 devbuf, pcb, routetbl, ifaddr, sysctl, vnodes, UFS mount, dirhash,
in_multi, exec, xform_data, VM swap, UVM amap, UVM aobj, USB,
USB device, temp
32 devbuf, pcb, routetbl, ifaddr, sem, dirhash, proc, VFS cluster,
in_multi, ether_multi, exec, pfkey data, xform_data, VM swap,
UVM amap, USB, crypto data, temp
64 devbuf, pcb, routetbl, ifaddr, vnodes, sem, dirhash, in_multi,
pfkey data, UVM amap, USB, NDP, temp
128 devbuf, routetbl, ifaddr, iov, vnodes, ttys, exec, pfkey data,
tdb,
UVM amap, USB, USB device, crypto data, NDP, temp
256 devbuf, routetbl, ifaddr, sysctl, ioctlops, vnodes, shm, VM map,
file desc, proc, NFS srvsock, NFS daemon, pfkey data, newblk,
UVM amap, USB, USB device, temp
512 devbuf, pcb, ifaddr, ioctlops, mount, UFS mount, shm, dirhash,
file desc, ttys, exec, UVM amap, USB device, temp
1024 devbuf, ioctlops, namecache, file desc, proc, ttys, exec, tdb,
UVM amap, UVM aobj, crypto data, temp
2048 devbuf, ifaddr, ioctlops, UFS mount, pagedep, VM swap, UVM
amap, temp
4096 devbuf, ioctlops, UFS mount, MSDOSFS mount, UVM amap, memdesc,
temp
8192 devbuf, NFS node, namecache, UFS quota, UFS mount, ISOFS mount,
inodedep, crypto data
16384 devbuf, UFS mount, VM swap, temp
32768 devbuf, namecache, VM swap, UVM amap
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
devbuf 2397 1445K 1445K 39322K 2458 0 0
16,32,64,128,256,512,1024,2048,4096,8192,16384,32768
pcb 95 8K 9K 39322K 469128 0 0 16,32,64,512
routetbl 220 20K 28K 39322K 1619554 0 0
16,32,64,128,256
ifaddr 118 22K 22K 39322K 120 0 0
16,32,64,128,256,512,2048
sysctl 2 1K 1K 39322K 2 0 0 16,256
ioctlops 0 0K 4K 39322K 10088436 0 0
256,512,1024,2048,4096
iov 0 0K 1K 39322K 18 0 0 128
mount 4 2K 4K 39322K 120 0 0 512
NFS node 1 8K 8K 39322K 1 0 0 8192
vnodes 41 7K 80K 39322K 350224 0 0
16,64,128,256
namecache 3 41K 41K 39322K 3 0 0
1024,8192,32768
UFS quota 1 8K 8K 39322K 1 0 0 8192
UFS mount 17 33K 68K 39322K 481 0 0
16,512,2048,4096,8192,16384
shm 2 1K 1K 39322K 2 0 0 256,512
VM map 4 1K 1K 39322K 4 0 0 256
sem 2 1K 1K 39322K 2 0 0 32,64
dirhash 78 15K 16K 39322K 8979 0 0 16,32,64,512
file desc 3 2K 7K 39322K 773 0 0 256,512,1024
proc 12 3K 3K 39322K 12 0 0 32,256,1024
VFS cluster 0 0K 1K 39322K 53484 0 0 32
NFS srvsock 1 1K 1K 39322K 1 0 0 256
NFS daemon 1 1K 1K 39322K 1 0 0 256
in_multi 114 5K 5K 39322K 115 0 0 16,32,64
ether_multi 60 2K 2K 39322K 61 0 0 32
ISOFS mount 1 8K 8K 39322K 1 0 0 8192
MSDOSFS mount 1 4K 4K 39322K 1 0 0 4096
ttys 420 263K 263K 39322K 420 0 0 128,512,1024
exec 0 0K 2K 39322K 1800891 0 0
16,32,128,512,1024
pfkey data 1 1K 1K 39322K 39 0 0
32,64,128,256
tdb 5 3K 3K 39322K 5 0 0 128,1024
xform_data 4 1K 1K 39322K 664042 0 0 16,32
pagedep 1 2K 2K 39322K 1 0 0 2048
inodedep 1 8K 8K 39322K 1 0 0 8192
newblk 1 1K 1K 39322K 1 0 0 256
VM swap 7 39K 39K 39322K 7 0 0
16,32,2048,16384,32768
UVM amap 4065 127K 205K 39322K264897527 0 0
16,32,64,128,256,512,1024,2048,4096,32768
UVM aobj 2 2K 2K 39322K 2 0 0 16,1024
USB 46 5K 5K 39322K 46 0 0
16,32,64,128,256
USB device 13 6K 6K 39322K 13 0 0
16,128,256,512
memdesc 1 4K 4K 39322K 1 0 0 4096
crypto data 12 18K 18K 39322K 12 0 0
32,128,1024,8192
NDP 21 2K 3K 39322K 25 0 0 64,128
temp 94 10K 28K 39322K 6261196 0 0
16,32,64,128,256,512,1024,2048,4096,16384
Memory Totals: In Use Free Requests
2115K 225K 286218211
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg
Maxpg Idle
phpool 32 685 0 0 6 0 6 6 0
8 0
extentpl 20 229 0 196 1 0 1 1 0
8 0
pmappl 84 1219710 0 1219679 1 0 1 1 0
8 0
vmsppl 188 1219710 0 1219679 3 0 3 3 0
8 1
vmmpepl 88 191190366 0 191186089 114 0 114 114 0
179 20
vmmpekpl 88 4509023 0 4495911 286 0 286 286 0
8 0
aobjpl 52 1 0 0 1 0 1 1 0
8 0
amappl 44 86847249 0 86845907 21 0 21 21 0
45 6
anonpl 16 113809131 0 113806618 44 0 44 44 0
62 34
bufpl 124 1906610 0 1899767 218 4 214 214 0
8 0
mbpl 256 139948029 0 139947363 68 0 68 68 1
384 7
mclpl 2048 89771507 0 89770865 479 0 479 479 4
3072 29
sockpl 212 977092 0 977000 6 0 6 6 0
8 1
procpl 344 1219725 0 1219679 6 0 6 6 0
8 1
processpl 20 1219725 0 1219679 1 0 1 1 0
8 0
zombiepl 72 1219679 0 1219679 1 0 1 1 0
8 1
ucredpl 80 590685 0 590629 2 0 2 2 0
8 0
pgrppl 24 180756 0 180741 1 0 1 1 0
8 0
sessionpl 48 180754 0 180739 1 0 1 1 0
8 0
pcredpl 24 1219725 0 1219679 1 0 1 1 0
8 0
lockfpl 52 3023 0 3021 1 0 1 1 0
8 0
filepl 88 50017650 0 50017503 4 0 4 4 0
8 0
fdescpl 296 1219726 0 1219679 5 0 5 5 0
8 1
pipepl 72 2056576 0 2056553 1 0 1 1 0
8 0
kqueuepl 192 686 0 683 1 0 1 1 0
8 0
knotepl 64 2923 0 2874 2 0 2 2 0
8 1
sigapl 316 1219710 0 1219679 4 0 4 4 0
8 1
wqtasks 20 490193 0 490193 1 0 1 1 0
8 1
wdcspl 96 5130491 0 5130490 1 0 1 1 0
8 0
scxspl 132 3 0 3 1 0 1 1 0
8 1
namei 1024 68487004 0 68487004 2 0 2 2 0
8 2
vnodes 148 2621 0 0 98 0 98 98 0
8 0
nchpl 72 3315172 0 3313864 295 271 24 24 0
8 0
ffsino 184 5248219 0 5245603 291 172 119 119 0
8 0
dino1pl 128 5248218 0 5245602 195 110 85 85 0
8 0
dirhash 1024 10230 0 10146 25 0 25 25 0
128 4
pfrulepl 824 265 0 10 67 0 67 67 0
8 2
pfstatepl 204 10843516 4940 10843385 527 0 527 527 0
527 514
pfstatekeypl 108 10843516 0 5769657 138375 1243 137132 137132 0
8 0
pfpooladdrpl 68 26 0 0 1 0 1 1 0
8 0
pfrktable 1240 84 0 42 28 0 28 28 0
334 0
pfrkentry 156 479 0 0 19 0 19 19 0
7693 0
pfosfpen 108 1392 0 696 30 11 19 19 0
8 0
pfosfp 28 814 0 407 3 0 3 3 0
8 0
rtentpl 108 362 0 283 3 0 3 3 0
8 0
rttmrpl 32 1 0 1 1 0 1 1 0
8 1
tcpcbpl 400 73879 0 73876 1 0 1 1 0
8 0
tcpqepl 16 162 0 162 1 0 1 1 0
13 1
sackhlpl 20 2 0 2 1 0 1 1 0
162 1
synpl 184 73783 0 73783 1 0 1 1 0
8 1
plimitpl 152 107808 0 107796 1 0 1 1 0
8 0
inpcbpl 216 507973 0 507967 1 0 1 1 0
8 0
ipsec policy 212 16 0 0 1 0 1 1 0
8 0
In use 540926K, total allocated 559516K; utilization 96.7%
Particularly, I saw this:
Memory Totals: In Use Free Requests
2115K 225K 286218211
And this:
In use 540926K, total allocated 559516K; utilization 96.7%
Which seems to be little to spare. I also checked that a swap device is
configured like this:
myhost:/var/crash# swapctl -l
Device 512-blocks Used Avail Capacity Priority
swap_device 1048320 0 1048320 0% 0
We have a suspect, which is a script written in python that monitors
several parameters every 5 minutes, calling vmstat, iostat, pfctl, etc.
I find it weird though, as this is the first time that this is
happening, on different hardware, and so close together in time. The
python version is python-2.4.4p4 and the script is run by root (as some
statistics can only be retrieved by root). Both machines were booted at
the approximately the same time a month ago by the way. I find this
theory not very probable though, as this script has been running on
other versions for a long time.
The other thing I can think of is something related to carp or pfsync.
Any input on this will be much appreciated.
Thank you,
Martmn.