On Wed, Jan 23, 2019 at 07:47:16PM +0100, Thuban wrote:
> * Juan Francisco Cantero Hurtado <[email protected]> le [20-01-2019 23:39:45 
> +0100]:
> > On Sun, Jan 20, 2019 at 07:24:44PM +0100, Karel Gardas wrote:
> > > 
> > > Based on my experience, softdep may be very fragile on the fast system 
> > > with slow drive. I guess what you see may be softdep own issue.
> > 
> > The panic is in softdep but he had also a crash without softdep.
> > 
> 
> True.
> Without softdep, I can't get access to the console (or didn't manage
> to).
> 
> I ran fsck -yf on the disk via on an amd64 machine as suggested.
> (Though, crash happens on arm64).

I suggested amd64 because the arch has more stable and widely tested
drivers.

> Everything seems ok.
> 
>       moria# fsck -fy /dev/sd2a
>       ** /dev/rsd2a
>       ** File system is already clean
>       ** Last Mounted on /vol/Samsung SSD 850-p1
>       ** Phase 1 - Check Blocks and Sizes
>       ** Phase 2 - Check Pathnames
>       ** Phase 3 - Check Connectivity
>       ** Phase 4 - Check Reference Counts
>       ** Phase 5 - Check Cyl groups
>       20863 files, 1492748 used, 37517771 free (739 frags, 4689629 blocks, 
> 0.0% fragme
>       moria# fsck -fy /dev/sd2d 
>       ** /dev/rsd2d
>       ** File system is already clean
>       ** Last Mounted on /vol/Samsung SSD 850-p2
>       ** Phase 1 - Check Blocks and Sizes
>       ** Phase 2 - Check Pathnames
>       ** Phase 3 - Check Connectivity
>       ** Phase 4 - Check Reference Counts
>       ** Phase 5 - Check Cyl groups
>       4 files, 4 used, 1034203 free (35 frags, 129271 blocks, 0.0% 
> fragmentation)
>       moria# fsck -fy /dev/sd2e 
>       ** /dev/rsd2e
>       ** File system is already clean
>       ** Last Mounted on /vol/Samsung SSD 850-p3
>       ** Phase 1 - Check Blocks and Sizes
>       ** Phase 2 - Check Pathnames
>       ** Phase 3 - Check Connectivity
>       ** Phase 4 - Check Reference Counts
>       ** Phase 5 - Check Cyl groups
>       26838 files, 2769562 used, 98993405 free (381 frags, 12374128 blocks, 
> 0.0% fragmentation)
> 
> 
> I had to crash the server at least 5 times to get an access to the
> console. Else, all I was able to do was a hard reboot.
> 
> Here the ddb output after a new crash.
> 
>       /var: got error 5 while accessing filesystem
>       panic: softdep_deallocate_dependencies: unrecovered I/O error

"unrecovered I/O error" sometimes happens due to a bad sector. Use amd64
to fill the SSD with zeroes using dd and use also nick's suggestion from
this thread:

http://openbsd-archive.7691.n7.nabble.com/ahci-error-during-install-of-6-4-td357865.html

The dmesg will show if you have bad sectors or not.



>       Stopped at      panic+0x154:        TID    PID    UID     PRFLAGS     
> PFLAGS  C
>       PU  COMMAND
>        161990  94125      0     0x14000      0x200    0  zerothread
>       *365345   9753      0    c+0x150
>       panic() at brelse+0xc4
>       brelse() at sd_buf_done+0x124

You're mounting the partitions from fstab, at the start of the init
process. The reason why you only can boot with softdep is because
softdep defers the write of the problematic blocks. sd_buf_done() is
part of softdep and brelse() needs to write to the disk.

If you need to save data from the drives, remove the partitions from
fstab and mount both manually with "ro" as mount option. If you're
lucky, the copy will not fail with ro.


>       sd_buf_done() at scsi_done+0x34
>       scsi_done() at usb_transfer_complete+0x238
>       usb_transfer_complete() at ehci_abort_xfer+0x258
>       ehci_abort_xfer() at ehci_timeout_task+0x34
>       https://www.openbsd.
>       ddb{1}> 
> 
>       ddb{1}> trace
>       db_enter() at panic+0x150
>       panic() at brelse+0xc4
>       brelse() at sd_buf_done+0x124
>       sd_buf_done() at scsi_done+0x34
>       scsi_done() at usb_transfer_complete+0x238
>       usb_transfer_complete() at ehci_abort_xfer+0x258
>       ehci_abort_xfer() at ehci_timeout_task+0x34
>       ehci_timeout_task() at usb_abort_task_thread+0xcc
>       usb_abort_task_thread() at proc_trampoline+0x10
> 
>       ddb{1}> show panic
>       softdep_deallocate_dependencies: unrecovered I/O error
> 
> 
>       ddb{1}> show uvm
>       Current UVM status:
>         pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>         988985 VM pages: 110183 active, 132343 inactive, 0 wired, 438054 free 
> (52331 z
>       ero)
>         min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>         freemin=32966, free-target=43954, inactive-target=89584, 
> wired-max=329661
>         faults=67079988, trap53705(153706), anget(retries)=20911670(0), 
> amapcopy=1415
>       0766
>               neighbor anon/obj pg=1580327/34256987, 
> gets(lock/unlock)=29150164/153708
>               cases: anon=19467283, anoncow=1444387, obj=26419482, 
> prcopy=2730679, przero
>       =17018267
>         daemon and swap counet=0
>               nswapdev=1
>               swpages=145355, swpginuse=0, swpgonly=0 paging=0
>         kernel pointers:
>               objs(kern)=0xffffff8000b10ed0
>       ddb{1}> 
>       Current UVM status:
>         pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>         988985 VM pages: 110183 active, 132343 inactive, 0 wired, 438054 free 
> (52331 z
>       ero)
>         min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>         freemin=32966, free-target=43954, inactive-target=89584, 
> wired-max=329661
>         faults=67079988, trap)=153705(153706), anget(retries)=20911670(0), 
> amapcopy=1415
>       0766
>               neighbor anon/obj pg=1580327/34256987, 
> gets(lock/unlock)=29150164/153708
>               cases: anon=19467283, anoncow=1444387, obj=26419482, 
> prcopy=2730679, przero
>       =17018267
>         daemon and swap cget=0
>               nswapdev=1
>               swpages=145355, swpginuse=0, swpgonly=0 paging=0
>         kernel pointers:
>               objs(kern)=0xffffff8000b10ed0
>       ddb{1}> 
>       Current UVM status:
>         pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>         988985 VM pages: 110183 active, 132343 inactive, 0 wired, 438054 free 
> (52331 z
>       ero)
>         min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>         freemin=32966, free-target=43954, inactive-target=89584, 
> wired-max=329661
>         faults=67079988, traps(total)=153705(153706), 
> anget(retries)=20911670(0), amapcopy=1415
>       0766
>               neighbor anon/obj pg=1580327/34256987, 
> gets(lock/unlock)=29150164/153708
>               cases: anon=19467283, anoncow=1444387, obj=26419482, 
> prcopy=2730679, przero
>       =17018267
>         daemon and nswget=0
>               nswapdev=1
>               swpages=145355, swpginuse=0, swpgonly=0 paging=0
>         kernel pointers:
>               objs(kern)=0xffffff8000b10ed0
>       ddb{1}> 
>       Current UVM status:
>         pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>         988985 VM pages: 110183 active, 132343 inactive, 0 wired, 438054 free 
> (52331 z
>       ero)
>         min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>         freemin=32966, free-target=43954, inactive-target=89584, 
> wired-max=329661
>         faults=67079988, trap153705(153706), anget(retries)=20911670(0), 
> amapcopy=1415
>       0766
>               neighbor anon/obj pg=1580327/34256987, 
> gets(lock/unlock)=29150164/153708
>               cases: anon=19467283, anoncow=1444387, obj=26419482, 
> prcopy=2730679, przero
>       =17018267
>         daemon and swap couget=0
>               nswapdev=1
>               swpages=145355, swpginuse=0, swpgonly=0 paging=0
>         kernel pointers:
>               objs(kern)=0xffffff8000b10ed0
>       ddb{1}> 
>       Current UVM status:
>         pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
>         988985 VM pages: 110183 active, 132343 inactive, 0 wired, 438054 free 
> (52331 z
>       ero)
>         min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
>         freemin=32966, free-target=43954, inactive-target=89584, 
> wired-max=329661
>         faults=67079988, trap(153706), anget(retries)=20911670(0), 
> amapcopy=1415
>       0766
>               neighbor anon/obj pg=1580327/34256987, 
> gets(lock/unlock)=29150164/153708
>               cases: anon=19467283, anoncow=1444387, obj=26419482, 
> prcopy=2730679, przero
>       =17018267
>         daemon and swap counts:
> 
>       ddb{1}> show bcstats
>       Current Buffer Cache status:
>       numbufs 24906 busymapped 135, delwri 35
>       kvaslots 6553 avail kva slots 6418
>       bufpages 197730, dmapages 197730, dirtypages 380
>       pendingreads 2, pendingwrites 129
>       highflips 0, highflops 0, dmaflips 0
> 
> 
> Regards.
> 

-- 
Juan Francisco Cantero Hurtado http://juanfra.info

Reply via email to