* Juan Francisco Cantero Hurtado <[email protected]> le [24-01-2019 00:09:17 +0100]: > On Wed, Jan 23, 2019 at 07:47:16PM +0100, Thuban wrote: > > * Juan Francisco Cantero Hurtado <[email protected]> le [20-01-2019 > > 23:39:45 +0100]: > > > On Sun, Jan 20, 2019 at 07:24:44PM +0100, Karel Gardas wrote: > > > > > > > > Based on my experience, softdep may be very fragile on the fast system > > > > with slow drive. I guess what you see may be softdep own issue. > > > > > > The panic is in softdep but he had also a crash without softdep. > > > > > > > True. > > Without softdep, I can't get access to the console (or didn't manage > > to). > > > > I ran fsck -yf on the disk via on an amd64 machine as suggested. > > (Though, crash happens on arm64). > > I suggested amd64 because the arch has more stable and widely tested > drivers. >
ok, it makes sense. > > Everything seems ok. > > > > moria# fsck -fy /dev/sd2a > > ** /dev/rsd2a > > ** File system is already clean > > ** Last Mounted on /vol/Samsung SSD 850-p1 > > ** Phase 1 - Check Blocks and Sizes > > ** Phase 2 - Check Pathnames > > ** Phase 3 - Check Connectivity > > ** Phase 4 - Check Reference Counts > > ** Phase 5 - Check Cyl groups > > 20863 files, 1492748 used, 37517771 free (739 frags, 4689629 blocks, > > 0.0% fragme > > moria# fsck -fy /dev/sd2d > > ** /dev/rsd2d > > ** File system is already clean > > ** Last Mounted on /vol/Samsung SSD 850-p2 > > ** Phase 1 - Check Blocks and Sizes > > ** Phase 2 - Check Pathnames > > ** Phase 3 - Check Connectivity > > ** Phase 4 - Check Reference Counts > > ** Phase 5 - Check Cyl groups > > 4 files, 4 used, 1034203 free (35 frags, 129271 blocks, 0.0% > > fragmentation) > > moria# fsck -fy /dev/sd2e > > ** /dev/rsd2e > > ** File system is already clean > > ** Last Mounted on /vol/Samsung SSD 850-p3 > > ** Phase 1 - Check Blocks and Sizes > > ** Phase 2 - Check Pathnames > > ** Phase 3 - Check Connectivity > > ** Phase 4 - Check Reference Counts > > ** Phase 5 - Check Cyl groups > > 26838 files, 2769562 used, 98993405 free (381 frags, 12374128 blocks, > > 0.0% fragmentation) > > > > > > I had to crash the server at least 5 times to get an access to the > > console. Else, all I was able to do was a hard reboot. > > > > Here the ddb output after a new crash. > > > > /var: got error 5 while accessing filesystem > > panic: softdep_deallocate_dependencies: unrecovered I/O error > > "unrecovered I/O error" sometimes happens due to a bad sector. Use amd64 > to fill the SSD with zeroes using dd and use also nick's suggestion from > this thread: > > http://openbsd-archive.7691.n7.nabble.com/ahci-error-during-install-of-6-4-td357865.html > > The dmesg will show if you have bad sectors or not. > I dis this (boy, that's very slow). At first it seems to solve the problem, but I had a new crash this morrning when transferring big files. > > > > Stopped at panic+0x154: TID PID UID PRFLAGS > > PFLAGS C > > PU COMMAND > > 161990 94125 0 0x14000 0x200 0 zerothread > > *365345 9753 0 c+0x150 > > panic() at brelse+0xc4 > > brelse() at sd_buf_done+0x124 > > You're mounting the partitions from fstab, at the start of the init > process. The reason why you only can boot with softdep is because > softdep defers the write of the problematic blocks. sd_buf_done() is > part of softdep and brelse() needs to write to the disk. > I can boot without softdep. I can't see the kernel panic and ddb> command line on console at every crash, which make gathering relevant information more difficult. After the last crash, I tried to copy big files on another hard drive (**not SSD**). I had the same issue. I wonder if it's not the ehci who has a bug. In the NetBSD manpage [1], you can read the following, which make me think of what happens whith my disck under OpenBSD : > BUGS > The support for hubs that are connected with high speed upstream and low or > full speed downstream (i.e., for transaction translators) is limited. Maybe it's just a coincidence. I'm thinking about this because just before the crash, I can see there messages (sometimes, but as I can't access the console as I should) : ehci_sync_hc: tsleep() = 35 ehci_sync_hc: tsleep() = 35 ehci_sync_hc: tsleep() = 35 Regards. [1] https://man.openbsd.org/NetBSD-7.0.1/ehci.4
