usbdevs -vv in case it is relevaant (it is an umass):
Controller /dev/usb0:
addr 01: 1022:0000 AMD, EHCI root hub
high speed, self powered, config 1, rev 1.00
driver: uhub0
port 01: 0000.0500 power
port 02: 0000.0500 power
port 03: 0000.0503 connect enabled power
port 04: 0000.0500 power
addr 02: 152d:2339 JMicron, USB to ATA/ATAPI Bridge
high speed, power 2 mA, config 1, rev 1.00, iSerial 152D203380B6
driver: umass0
Controller /dev/usb1:
addr 01: 1022:0000 AMD, OHCI root hub
full speed, self powered, config 1, rev 1.00
driver: uhub1
port 01: 0000.0100 power
port 02: 0000.0100 power
port 03: 0000.0100 power
port 04: 0000.0100 power
On Jan 14 09:52:20, [email protected] wrote:
> OK, cereal now.
>
> On Jan 08 11:47:25, [email protected] wrote:
> > This is 6.6-current on an ALIX (dmesg below), serving as my home server.
> > It's where I store the daily.local dumps, into /backup on a big USB disk.
> > The machine often crashes into a ddb when saving those dumps.
>
> This time, it panicked on 'ls /backup'
> while remote machines were rsyncing their dumps into /backup.
>
> During normal operation:
>
> $ cat /etc/fstab
> bf940e6c7aaf2c50.a / ffs rw 1 1
> bf940e6c7aaf2c50.d /usr ffs rw,nodev,noatime 1 2
> bf940e6c7aaf2c50.e /usr/local ffs rw,nodev,noatime,wxallowed 1 2
> bf940e6c7aaf2c50.f /var ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.g /var/log ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.h /var/www ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.i /var/mail ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.j /var/postgresql ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.k /tmp ffs rw,nodev,noatime,nosuid 1 2
> bf940e6c7aaf2c50.l /home ffs rw,nodev,noatime,nosuid 1 2
> 31efc37173c4c30f.a /backup ffs rw,nodev,noatime,nosuid
>
> $ mount -v
> /dev/wd0a (bf940e6c7aaf2c50.a) on / type ffs (rw, local, ctime=Tue Jan 14
> 08:24:40 2020)
> /dev/wd0d (bf940e6c7aaf2c50.d) on /usr type ffs (rw, local, noatime, nodev,
> ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0e (bf940e6c7aaf2c50.e) on /usr/local type ffs (rw, local, noatime,
> nodev, wxallowed, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0f (bf940e6c7aaf2c50.f) on /var type ffs (rw, local, noatime, nodev,
> nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0g (bf940e6c7aaf2c50.g) on /var/log type ffs (rw, local, noatime,
> nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0h (bf940e6c7aaf2c50.h) on /var/www type ffs (rw, local, noatime,
> nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0i (bf940e6c7aaf2c50.i) on /var/mail type ffs (rw, local, noatime,
> nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0j (bf940e6c7aaf2c50.j) on /var/postgresql type ffs (rw, local,
> noatime, nodev, nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0k (bf940e6c7aaf2c50.k) on /tmp type ffs (rw, local, noatime, nodev,
> nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/wd0l (bf940e6c7aaf2c50.l) on /home type ffs (rw, local, noatime, nodev,
> nosuid, ctime=Tue Jan 14 08:24:30 2020)
> /dev/sd0a (31efc37173c4c30f.a) on /backup type ffs (rw, local, noatime,
> nodev, nosuid, ctime=Tue Jan 14 09:01:34 2020)
>
> $ df -h
> Filesystem Size Used Avail Capacity Mounted on
> /dev/wd0a 254M 49.2M 192M 20% /
> /dev/wd0d 1008M 484M 474M 50% /usr
> /dev/wd0e 1.5G 386M 1.0G 27% /usr/local
> /dev/wd0f 131M 9.5M 115M 8% /var
> /dev/wd0g 131M 123M 930K 99% /var/log
> /dev/wd0h 2.0G 1.4G 481M 75% /var/www
> /dev/wd0i 131M 53.1M 70.9M 43% /var/mail
> /dev/wd0j 131M 2.0K 124M 0% /var/postgresql
> /dev/wd0k 131M 4.0K 124M 0% /tmp
> /dev/wd0l 1.7G 1.4G 230M 86% /home
> /dev/sd0a 931G 135G 750G 15% /backup
>
> > It always happens during the nightly heavy writes.
> > The machine itself is dumping local fs's into /backup,
> > and a few other machines rsync their dumps into there.
> > All of that is run from daily.local (on each of the machines).
>
> This is the panic:
>
> panic: ehci_device_clear_toggle: queue active
> Stopped at db_enter+0x4: popl %ebp
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> *125358 98098 0 0x14000 0x200 0 usbatsk
> db_enter() at db_enter+0x4
> panic(d0b39e8d) at panic+0xcc
> ehci_device_clear_toggle(d1923b80) at ehci_device_clear_toggle+0x25
> umass_bbb_state(d18e0640,d1954e00,f) at umass_bbb_state+0x14b
> usb_transfer_complete(d18e0640) at usb_transfer_complete+0x1b7
> ehci_abort_xfer(d18e0640,f) at ehci_abort_xferpanic:
> ehci_device_clear_toggle: queue active
> Stopped at db_enter+0x4: popl %ebp
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> *125358 98098 0 0x14000 0x200 0 usbatsk
> db_enter() at db_enter+0x4
> panic(d0b39e8d) at panic+0xcc
> ehci_d
>
> ddb> trace
> db_enter() at db_enter+0x4
> panic(d0b39e8d) at panic+0xcc
> ehci_device_clear_toggle(d1923b80) at ehci_device_clear_toggle+0x25
> umass_bbb_state(d18e0640,d1954e00,f) at umass_bbb_state+0x14b
> usb_transfer_complete(d18e0640) at usb_transfer_complete+0x1b7
> ehci_abort_xfer(d18e0640,f) at ehci_abort_xfer+0x2a5
> ehci_timeout_task(d18e0640) at ehci_timeout_task+0x1c
> usb_abort_task_thread(d18dcbe0) at usb_abort_task_thread+0x98
>
> ddb> ps
> PID TID PPID UID S FLAGS WAIT COMMAND
> 10274 303125 2215 1000 3 0x100003 biowait ksh
> 2215 479501 10522 1000 3 0x90 select sshd
> 10522 375696 65433 0 3 0x92 poll sshd
> 2783 186167 65433 0 3 0x82 netio sshd
> 16791 175461 32411 0 3 0x100082 nanosleep sleep
> 32411 408555 86211 0 3 0x10008a pause sh
> 86211 373939 43164 0 3 0x100090 piperd cron
> 74312 511446 55156 1000 3 0 inode rsync
> 55156 267723 17081 1000 3 0x82 select rsync
> 17081 109318 40374 1000 3 0x10008a pause ksh
> 40374 348103 35069 1000 3 0x90 select sshd
> 35069 263033 65433 0 3 0x92 poll sshd
> 95492 378695 16393 1000 3 0 biowait rsync
> 16393 275467 69978 1000 3 0x82 select rsync
> 69978 69676 33879 1000 3 0x10008a pause ksh
> 33879 207896 3716 1000 3 0x90 select sshd
> 3716 108288 65433 0 3 0x92 poll sshd
> 52665 366181 83150 1000 3 0 biowait rsync
> 83150 258754 3331 1000 3 0x82 select rsync
> 3331 23380 76282 1000 3 0x10008a pause ksh
> 76282 30302 9526 1000 3 0x90 select sshd
> --9526ore24952 65433 0 3 0x92 poll sshd
> 65257 154470 1 0 3 0x100083 ttyin getty
> 5060 100746 1 0 3 0x100083 ttyin getty
> 36162 158059 1 0 3 0x100083 ttyin getty
> 41188 123069 1 0 3 0x100083 ttyin getty
> 29419 55117 1 0 3 0x100083 ttyin getty
> 96178 161040 1 0 3 0x100083 ttyin getty
> 43164 323504 1 0 3 0x100098 poll cron
> 60842 485881 1 62 3 0x100090 bpf spamlogd
> 91511 190579 6152 62 3 0x100090 piperd spamd
> 19596 269263 6152 62 3 0x100090 poll spamd
> 6152 35793 1 62 3 0x100090 nanosleep spamd
> 86910 304526 1 0 3 0x100080 kqread httpd
> 99066 107831 1 67 3 0x100092 kqread httpd
> 42371 129422 1 67 3 0x100092 kqread httpd
> 35161 459075 1 67 3 0x100092 kqread httpd
> 57014 239036 1 67 3 0x100092 kqread httpd
> 63252 238046 916 95 3 0x100092 kqread smtpd
> 56940 1192 916 103 3 0x100092 kqread smtpd
> 80613 106178 916 95 3 0x100092 kqread smtpd
> 57181 187925 916 95 3 0x100092 kqread smtpd
> 63175 460639 916 95 3 0x100092 kqread smtpd
> 3873 380642 916 95 3 0x100092 kqread smtpd
> --d916or494473 1 0 3 0x100080 kqread smtpd
> 65433 76251 1 0 3 0x80 select sshd
> 88600 110121 1 0 3 0x100080 poll ntpd
> 19297 195954 20100 83 3 0x100092 poll ntpd
> 20100 17660 1 83 3 0x100092 poll ntpd
> 31191 85959 26397 74 3 0x100092 bpf pflogd
> 26397 187118 1 0 3 0x80 netio pflogd
> 68309 448011 94742 73 3 0x100090 kqread syslogd
> 94742 69614 1 0 3 0x100082 netio syslogd
> 53492 243971 27347 115 3 0x100092 kqread slaacd
> 52858 458237 27347 115 3 0x100092 kqread slaacd
> 27347 12424 1 0 3 0x100080 kqread slaacd
> 17915 225322 0 0 3 0x14200 pgzero zerothread
> 11251 244396 0 0 3 0x14200 aiodoned aiodoned
> 87688 217902 0 0 3 0x14200 biowait update
> 40876 491170 0 0 3 0x14200 cleaner cleaner
> 44830 212355 0 0 3 0x14200 reaper reaper
> 46683 273930 0 0 3 0x14200 pgdaemon pagedaemon
> 44658 103575 0 0 3 0x14200 bored crynlk
> 42415 319793 0 0 3 0x14200 bored crypto
> 47079 305999 0 0 3 0x14200 bored sensors
> 59155 437520 0 0 3 0x14200 usbtsk usbtask
> *98098 125358 0 0 7 0x14200 usbatsk
> -81352or375961 0 0 3 0x14200 apmev apm0
> 18033 252886 0 0 3 0x14200 bored softnet
> 22122 233491 0 0 3 0x14200 bored systqmp
> 49005 141234 0 0 3 0x14200 bored systq
> 86324 264951 0 0 3 0x40014200 bored softclock
> 23945 31167 0 0 3 0x40014200 idle0
> 50608 6288 0 0 3 0x14200 bored smr
> 95605 198109 0 0 3 0x14200 kmalloc kmthread
> 1 394155 0 0 3 0x82 wait init
> 0 0 -1 0 3 0x10200 scheduler swapper
>
> ddb> show uvm
> Current UVM status:
> pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> 58510 VM pages: 13306 active, 984 inactive, 0 wired, 26309 free (3299 zero)
> min 10% (25) anon, 10% (25) vnode, 5% (12) vtext
> freemin=1950, free-target=2600, inactive-target=0, wired-max=19503
> faults=10200852, traps=22512570, intrs=30008005, ctxswitch=28752503
> fpuswitch
> =10557
> softint=26097176, syscalls=27596327, kmapent=21
> fault counts:
> noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
>
> > show bcstats
> Current Buffer Cache status:
> numbufs 849 busymapped 13, delwri 32
> kvaslots 731 avail kva slots 718
> bufpages 11626, dmapages 11626, dirtypages 188
> pendingreads 4, pendingwrites 9
> highflips 0, highflops 0, dmaflips 0
>
> ddb> show registers
> ds 0x10
> es 0x10
> fs 0x20
> gs 0
> edi 0xd0b39e8d cmd0646_9_tim_udma+0x61fd
> esi 0x100
> ebp 0xf34f4ddc
> ebx 0xf34f4e04
> edx 0x2
> ecx 0
> eax 0x1
> eip 0xd04734e4 db_enter+0x4
> cs 0x8
> eflags 0x200202 __kernel_base_phys+0x12e
> esp 0xf34f4ddc
> ss 0x10
> db_enter+0x4: popl %ebp
>
>
> > Is it the disk? I did a complete dd read and a complete dd write
> > before I put it to use, and it didn't show any problem. No signs
> > of failed io in messages either.
> >
> > Is it the USB cable? I replaced it a couple of times.
> > Is it the USB interface on the ALIX?
>
> > Is it the small amount of memory? I tried using openrsync,
> > which would fail on malloc (it mmaps the files).
> > Could it be the rsync over ssh? (I will try to run just
> > local backup for some time, without rsyncing the others).
>
>
> How can I debug this further?
>
> Jan
>
>
> $ dmesg
> OpenBSD 6.6-current (GENERIC) #492: Sun Jan 12 20:10:07 MST 2020
> [email protected]:/usr/src/sys/arch/i386/compile/GENERIC
> real mem = 259207168 (247MB)
> avail mem = 238825472 (227MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: date 07/19/10, BIOS32 rev. 0 @ 0xfa950
> apm0 at bios0: Power Management spec V1.2 (slowidle)
> pcibios0 at bios0: rev 2.1 @ 0xf0000/0xdfb4
> pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xfdf30/128 (6 entries)
> pcibios0: PCI Exclusive IRQs: 5 10 11
> pcibios0: no compatible PCI ICU found: ICU vendor 0x1022 product 0x2090
> pcibios0: Warning, unable to fix up PCI interrupt routing
> pcibios0: PCI bus #0 is the last bus
> bios0: ROM list: 0xc0000/0x8000 0xc8000/0xa800 0xef000/0x1000!
> cpu0 at mainbus0: (uniprocessor)
> cpu0: Geode(TM) Integrated Processor by AMD PCS ("AuthenticAMD" 586-class)
> 499 MHz, 05-0a-02
> cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX,MMXX,3DNOW2,3DNOW
> mtrr: K6-family MTRR support (2 registers)
> amdmsr0 at mainbus0
> pci0 at mainbus0 bus 0: configuration mode 1 (bios)
> pchb0 at pci0 dev 1 function 0 "AMD Geode LX" rev 0x33
> vga1 at pci0 dev 1 function 1 "AMD Geode LX Video" rev 0x00
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> glxsb0 at pci0 dev 1 function 2 "AMD Geode LX Crypto" rev 0x00: RNG AES
> vr0 at pci0 dev 13 function 0 "VIA VT6105M RhineIII" rev 0x96: irq 11,
> address 00:0d:b9:0e:9e:f4
> ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
> 0x004063, model 0x0034
> glxpcib0 at pci0 dev 15 function 0 "AMD CS5536 ISA" rev 0x03: rev 3, 32-bit
> 3579545Hz timer, watchdog, gpio, i2c
> gpio0 at glxpcib0: 32 pins
> iic0 at glxpcib0
> pciide0 at pci0 dev 15 function 2 "AMD CS5536 IDE" rev 0x01: DMA, channel 0
> wired to compatibility, channel 1 wired to compatibility
> wd0 at pciide0 channel 0 drive 0: <ELITE PRO CF CARD 8GB>
> wd0: 1-sector PIO, LBA, 7279MB, 14909328 sectors
> wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
> pciide0: channel 1 ignored (disabled)
> auglx0 at pci0 dev 15 function 3 "AMD CS5536 Audio" rev 0x01: irq 11, CS5536
> AC97
> ac97: codec id 0x414c4770 (Avance Logic ALC203 rev 0)
> ac97: codec features headphone, 20 bit DAC, 18 bit ADC, No 3D Stereo
> audio0 at auglx0
> ohci0 at pci0 dev 15 function 4 "AMD CS5536 USB" rev 0x02: irq 5, version
> 1.0, legacy support
> ehci0 at pci0 dev 15 function 5 "AMD CS5536 USB" rev 0x02: irq 5
> usb0 at ehci0: USB revision 2.0
> uhub0 at usb0 configuration 1 interface 0 "AMD EHCI root hub" rev 2.00/1.00
> addr 1
> isa0 at glxpcib0
> isadma0 at isa0
> com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> com0: console
> com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> pckbd0 at pckbc0 (kbd slot)
> wskbd0 at pckbd0: console keyboard, using wsdisplay0
> pcppi0 at isa0 port 0x61
> spkr0 at pcppi0
> lpt0 at isa0 port 0x378/4 irq 7
> wbsio0 at isa0 port 0x2e/2: W83627HF rev 0x41
> lm1 at wbsio0 port 0x290/8: W83627HF
> npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
> usb1 at ohci0: USB revision 1.0
> uhub1 at usb1 configuration 1 interface 0 "AMD OHCI root hub" rev 1.00/1.00
> addr 1
> umass0 at uhub0 port 3 configuration 1 interface 0 "JMicron USB to ATA/ATAPI
> Bridge" rev 2.00/1.00 addr 2
> umass0: using SCSI over Bulk-Only
> scsibus1 at umass0: 2 targets, initiator 0
> sd0 at scsibus1 targ 1 lun 0: <WDC WD10, TPVT-00HT5T0, >
> serial.152d2339DA0AC90273FF
> sd0: 953869MB, 512 bytes/sector, 1953525168 sectors
> vscsi0 at root
> scsibus2 at vscsi0: 256 targets
> softraid0 at root
> scsibus3 at softraid0: 256 targets
> root on wd0a (bf940e6c7aaf2c50.a) swap on wd0b dump on wd0b