OK, cereal now.
On Jan 08 11:47:25, [email protected] wrote:
> This is 6.6-current on an ALIX (dmesg below), serving as my home server.
> It's where I store the daily.local dumps, into /backup on a big USB disk.
> The machine often crashes into a ddb when saving those dumps.
This time, it panicked on 'ls /backup'
while remote machines were rsyncing their dumps into /backup.
During normal operation:
$ cat /etc/fstab
bf940e6c7aaf2c50.a / ffs rw 1 1
bf940e6c7aaf2c50.d /usr ffs rw,nodev,noatime 1 2
bf940e6c7aaf2c50.e /usr/local ffs rw,nodev,noatime,wxallowed 1 2
bf940e6c7aaf2c50.f /var ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.g /var/log ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.h /var/www ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.i /var/mail ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.j /var/postgresql ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.k /tmp ffs rw,nodev,noatime,nosuid 1 2
bf940e6c7aaf2c50.l /home ffs rw,nodev,noatime,nosuid 1 2
31efc37173c4c30f.a /backup ffs rw,nodev,noatime,nosuid
$ mount -v
/dev/wd0a (bf940e6c7aaf2c50.a) on / type ffs (rw, local, ctime=Tue Jan 14
08:24:40 2020)
/dev/wd0d (bf940e6c7aaf2c50.d) on /usr type ffs (rw, local, noatime, nodev,
ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0e (bf940e6c7aaf2c50.e) on /usr/local type ffs (rw, local, noatime,
nodev, wxallowed, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0f (bf940e6c7aaf2c50.f) on /var type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0g (bf940e6c7aaf2c50.g) on /var/log type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0h (bf940e6c7aaf2c50.h) on /var/www type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0i (bf940e6c7aaf2c50.i) on /var/mail type ffs (rw, local, noatime,
nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0j (bf940e6c7aaf2c50.j) on /var/postgresql type ffs (rw, local, noatime,
nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0k (bf940e6c7aaf2c50.k) on /tmp type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/wd0l (bf940e6c7aaf2c50.l) on /home type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 08:24:30 2020)
/dev/sd0a (31efc37173c4c30f.a) on /backup type ffs (rw, local, noatime, nodev,
nosuid, ctime=Tue Jan 14 09:01:34 2020)
$ df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/wd0a 254M 49.2M 192M 20% /
/dev/wd0d 1008M 484M 474M 50% /usr
/dev/wd0e 1.5G 386M 1.0G 27% /usr/local
/dev/wd0f 131M 9.5M 115M 8% /var
/dev/wd0g 131M 123M 930K 99% /var/log
/dev/wd0h 2.0G 1.4G 481M 75% /var/www
/dev/wd0i 131M 53.1M 70.9M 43% /var/mail
/dev/wd0j 131M 2.0K 124M 0% /var/postgresql
/dev/wd0k 131M 4.0K 124M 0% /tmp
/dev/wd0l 1.7G 1.4G 230M 86% /home
/dev/sd0a 931G 135G 750G 15% /backup
> It always happens during the nightly heavy writes.
> The machine itself is dumping local fs's into /backup,
> and a few other machines rsync their dumps into there.
> All of that is run from daily.local (on each of the machines).
This is the panic:
panic: ehci_device_clear_toggle: queue active
Stopped at db_enter+0x4: popl %ebp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*125358 98098 0 0x14000 0x200 0 usbatsk
db_enter() at db_enter+0x4
panic(d0b39e8d) at panic+0xcc
ehci_device_clear_toggle(d1923b80) at ehci_device_clear_toggle+0x25
umass_bbb_state(d18e0640,d1954e00,f) at umass_bbb_state+0x14b
usb_transfer_complete(d18e0640) at usb_transfer_complete+0x1b7
ehci_abort_xfer(d18e0640,f) at ehci_abort_xferpanic: ehci_device_clear_toggle:
queue active
Stopped at db_enter+0x4: popl %ebp
TID PID UID PRFLAGS PFLAGS CPU COMMAND
*125358 98098 0 0x14000 0x200 0 usbatsk
db_enter() at db_enter+0x4
panic(d0b39e8d) at panic+0xcc
ehci_d
ddb> trace
db_enter() at db_enter+0x4
panic(d0b39e8d) at panic+0xcc
ehci_device_clear_toggle(d1923b80) at ehci_device_clear_toggle+0x25
umass_bbb_state(d18e0640,d1954e00,f) at umass_bbb_state+0x14b
usb_transfer_complete(d18e0640) at usb_transfer_complete+0x1b7
ehci_abort_xfer(d18e0640,f) at ehci_abort_xfer+0x2a5
ehci_timeout_task(d18e0640) at ehci_timeout_task+0x1c
usb_abort_task_thread(d18dcbe0) at usb_abort_task_thread+0x98
ddb> ps
PID TID PPID UID S FLAGS WAIT COMMAND
10274 303125 2215 1000 3 0x100003 biowait ksh
2215 479501 10522 1000 3 0x90 select sshd
10522 375696 65433 0 3 0x92 poll sshd
2783 186167 65433 0 3 0x82 netio sshd
16791 175461 32411 0 3 0x100082 nanosleep sleep
32411 408555 86211 0 3 0x10008a pause sh
86211 373939 43164 0 3 0x100090 piperd cron
74312 511446 55156 1000 3 0 inode rsync
55156 267723 17081 1000 3 0x82 select rsync
17081 109318 40374 1000 3 0x10008a pause ksh
40374 348103 35069 1000 3 0x90 select sshd
35069 263033 65433 0 3 0x92 poll sshd
95492 378695 16393 1000 3 0 biowait rsync
16393 275467 69978 1000 3 0x82 select rsync
69978 69676 33879 1000 3 0x10008a pause ksh
33879 207896 3716 1000 3 0x90 select sshd
3716 108288 65433 0 3 0x92 poll sshd
52665 366181 83150 1000 3 0 biowait rsync
83150 258754 3331 1000 3 0x82 select rsync
3331 23380 76282 1000 3 0x10008a pause ksh
76282 30302 9526 1000 3 0x90 select sshd
--9526ore24952 65433 0 3 0x92 poll sshd
65257 154470 1 0 3 0x100083 ttyin getty
5060 100746 1 0 3 0x100083 ttyin getty
36162 158059 1 0 3 0x100083 ttyin getty
41188 123069 1 0 3 0x100083 ttyin getty
29419 55117 1 0 3 0x100083 ttyin getty
96178 161040 1 0 3 0x100083 ttyin getty
43164 323504 1 0 3 0x100098 poll cron
60842 485881 1 62 3 0x100090 bpf spamlogd
91511 190579 6152 62 3 0x100090 piperd spamd
19596 269263 6152 62 3 0x100090 poll spamd
6152 35793 1 62 3 0x100090 nanosleep spamd
86910 304526 1 0 3 0x100080 kqread httpd
99066 107831 1 67 3 0x100092 kqread httpd
42371 129422 1 67 3 0x100092 kqread httpd
35161 459075 1 67 3 0x100092 kqread httpd
57014 239036 1 67 3 0x100092 kqread httpd
63252 238046 916 95 3 0x100092 kqread smtpd
56940 1192 916 103 3 0x100092 kqread smtpd
80613 106178 916 95 3 0x100092 kqread smtpd
57181 187925 916 95 3 0x100092 kqread smtpd
63175 460639 916 95 3 0x100092 kqread smtpd
3873 380642 916 95 3 0x100092 kqread smtpd
--d916or494473 1 0 3 0x100080 kqread smtpd
65433 76251 1 0 3 0x80 select sshd
88600 110121 1 0 3 0x100080 poll ntpd
19297 195954 20100 83 3 0x100092 poll ntpd
20100 17660 1 83 3 0x100092 poll ntpd
31191 85959 26397 74 3 0x100092 bpf pflogd
26397 187118 1 0 3 0x80 netio pflogd
68309 448011 94742 73 3 0x100090 kqread syslogd
94742 69614 1 0 3 0x100082 netio syslogd
53492 243971 27347 115 3 0x100092 kqread slaacd
52858 458237 27347 115 3 0x100092 kqread slaacd
27347 12424 1 0 3 0x100080 kqread slaacd
17915 225322 0 0 3 0x14200 pgzero zerothread
11251 244396 0 0 3 0x14200 aiodoned aiodoned
87688 217902 0 0 3 0x14200 biowait update
40876 491170 0 0 3 0x14200 cleaner cleaner
44830 212355 0 0 3 0x14200 reaper reaper
46683 273930 0 0 3 0x14200 pgdaemon pagedaemon
44658 103575 0 0 3 0x14200 bored crynlk
42415 319793 0 0 3 0x14200 bored crypto
47079 305999 0 0 3 0x14200 bored sensors
59155 437520 0 0 3 0x14200 usbtsk usbtask
*98098 125358 0 0 7 0x14200 usbatsk
-81352or375961 0 0 3 0x14200 apmev apm0
18033 252886 0 0 3 0x14200 bored softnet
22122 233491 0 0 3 0x14200 bored systqmp
49005 141234 0 0 3 0x14200 bored systq
86324 264951 0 0 3 0x40014200 bored softclock
23945 31167 0 0 3 0x40014200 idle0
50608 6288 0 0 3 0x14200 bored smr
95605 198109 0 0 3 0x14200 kmalloc kmthread
1 394155 0 0 3 0x82 wait init
0 0 -1 0 3 0x10200 scheduler swapper
ddb> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
58510 VM pages: 13306 active, 984 inactive, 0 wired, 26309 free (3299 zero)
min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
freemin=1950, free-target=2600, inactive-target=0, wired-max=19503
faults=10200852, traps=22512570, intrs=30008005, ctxswitch=28752503 fpuswitch
=10557
softint=26097176, syscalls=27596327, kmapent=21
fault counts:
noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> show bcstats
Current Buffer Cache status:
numbufs 849 busymapped 13, delwri 32
kvaslots 731 avail kva slots 718
bufpages 11626, dmapages 11626, dirtypages 188
pendingreads 4, pendingwrites 9
highflips 0, highflops 0, dmaflips 0
ddb> show registers
ds 0x10
es 0x10
fs 0x20
gs 0
edi 0xd0b39e8d cmd0646_9_tim_udma+0x61fd
esi 0x100
ebp 0xf34f4ddc
ebx 0xf34f4e04
edx 0x2
ecx 0
eax 0x1
eip 0xd04734e4 db_enter+0x4
cs 0x8
eflags 0x200202 __kernel_base_phys+0x12e
esp 0xf34f4ddc
ss 0x10
db_enter+0x4: popl %ebp
> Is it the disk? I did a complete dd read and a complete dd write
> before I put it to use, and it didn't show any problem. No signs
> of failed io in messages either.
>
> Is it the USB cable? I replaced it a couple of times.
> Is it the USB interface on the ALIX?
> Is it the small amount of memory? I tried using openrsync,
> which would fail on malloc (it mmaps the files).
> Could it be the rsync over ssh? (I will try to run just
> local backup for some time, without rsyncing the others).
How can I debug this further?
Jan
$ dmesg
OpenBSD 6.6-current (GENERIC) #492: Sun Jan 12 20:10:07 MST 2020
[email protected]:/usr/src/sys/arch/i386/compile/GENERIC
real mem = 259207168 (247MB)
avail mem = 238825472 (227MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: date 07/19/10, BIOS32 rev. 0 @ 0xfa950
apm0 at bios0: Power Management spec V1.2 (slowidle)
pcibios0 at bios0: rev 2.1 @ 0xf0000/0xdfb4
pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdf30/128 (6 entries)
pcibios0: PCI Exclusive IRQs: 5 10 11
pcibios0: no compatible PCI ICU found: ICU vendor 0x1022 product 0x2090
pcibios0: Warning, unable to fix up PCI interrupt routing
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0xa800 0xef000/0x1000!
cpu0 at mainbus0: (uniprocessor)
cpu0: Geode(TM) Integrated Processor by AMD PCS ("AuthenticAMD" 586-class) 499
MHz, 05-0a-02
cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX,MMXX,3DNOW2,3DNOW
mtrr: K6-family MTRR support (2 registers)
amdmsr0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 1 function 0 "AMD Geode LX" rev 0x33
vga1 at pci0 dev 1 function 1 "AMD Geode LX Video" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
glxsb0 at pci0 dev 1 function 2 "AMD Geode LX Crypto" rev 0x00: RNG AES
vr0 at pci0 dev 13 function 0 "VIA VT6105M RhineIII" rev 0x96: irq 11, address
00:0d:b9:0e:9e:f4
ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063,
model 0x0034
glxpcib0 at pci0 dev 15 function 0 "AMD CS5536 ISA" rev 0x03: rev 3, 32-bit
3579545Hz timer, watchdog, gpio, i2c
gpio0 at glxpcib0: 32 pins
iic0 at glxpcib0
pciide0 at pci0 dev 15 function 2 "AMD CS5536 IDE" rev 0x01: DMA, channel 0
wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: <ELITE PRO CF CARD 8GB>
wd0: 1-sector PIO, LBA, 7279MB, 14909328 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
auglx0 at pci0 dev 15 function 3 "AMD CS5536 Audio" rev 0x01: irq 11, CS5536
AC97
ac97: codec id 0x414c4770 (Avance Logic ALC203 rev 0)
ac97: codec features headphone, 20 bit DAC, 18 bit ADC, No 3D Stereo
audio0 at auglx0
ohci0 at pci0 dev 15 function 4 "AMD CS5536 USB" rev 0x02: irq 5, version 1.0,
legacy support
ehci0 at pci0 dev 15 function 5 "AMD CS5536 USB" rev 0x02: irq 5
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00
addr 1
isa0 at glxpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
lpt0 at isa0 port 0x378/4 irq 7
wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x41
lm1 at wbsio0 port 0x290/8: W83627HF
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 configuration 1 interface 0 "AMD OHCI root hub" rev 1.00/1.00
addr 1
umass0 at uhub0 port 3 configuration 1 interface 0 "JMicron USB to ATA/ATAPI
Bridge" rev 2.00/1.00 addr 2
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, initiator 0
sd0 at scsibus1 targ 1 lun 0: <WDC WD10, TPVT-00HT5T0, >
serial.152d2339DA0AC90273FF
sd0: 953869MB, 512 bytes/sector, 1953525168 sectors
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on wd0a (bf940e6c7aaf2c50.a) swap on wd0b dump on wd0b