* Juan Francisco Cantero Hurtado <[email protected]> le [24-01-2019 00:09:17 
+0100]:
> On Wed, Jan 23, 2019 at 07:47:16PM +0100, Thuban wrote:
> > * Juan Francisco Cantero Hurtado <[email protected]> le [20-01-2019 
> > 23:39:45 +0100]:
> > > On Sun, Jan 20, 2019 at 07:24:44PM +0100, Karel Gardas wrote:
> > > > 
> > > > Based on my experience, softdep may be very fragile on the fast system 
> > > > with slow drive. I guess what you see may be softdep own issue.
> > > 
> > > The panic is in softdep but he had also a crash without softdep.
> > > 
> > 
> > True.
> > Without softdep, I can't get access to the console (or didn't manage
> > to).
> > 
> > I ran fsck -yf on the disk via on an amd64 machine as suggested.
> > (Though, crash happens on arm64).
> 
> I suggested amd64 because the arch has more stable and widely tested
> drivers.
> 

ok, it makes sense.

> > Everything seems ok.
> > 
> >     moria# fsck -fy /dev/sd2a
> >     ** /dev/rsd2a
> >     ** File system is already clean
> >     ** Last Mounted on /vol/Samsung SSD 850-p1
> >     ** Phase 1 - Check Blocks and Sizes
> >     ** Phase 2 - Check Pathnames
> >     ** Phase 3 - Check Connectivity
> >     ** Phase 4 - Check Reference Counts
> >     ** Phase 5 - Check Cyl groups
> >     20863 files, 1492748 used, 37517771 free (739 frags, 4689629 blocks, 
> > 0.0% fragme
> >     moria# fsck -fy /dev/sd2d 
> >     ** /dev/rsd2d
> >     ** File system is already clean
> >     ** Last Mounted on /vol/Samsung SSD 850-p2
> >     ** Phase 1 - Check Blocks and Sizes
> >     ** Phase 2 - Check Pathnames
> >     ** Phase 3 - Check Connectivity
> >     ** Phase 4 - Check Reference Counts
> >     ** Phase 5 - Check Cyl groups
> >     4 files, 4 used, 1034203 free (35 frags, 129271 blocks, 0.0% 
> > fragmentation)
> >     moria# fsck -fy /dev/sd2e 
> >     ** /dev/rsd2e
> >     ** File system is already clean
> >     ** Last Mounted on /vol/Samsung SSD 850-p3
> >     ** Phase 1 - Check Blocks and Sizes
> >     ** Phase 2 - Check Pathnames
> >     ** Phase 3 - Check Connectivity
> >     ** Phase 4 - Check Reference Counts
> >     ** Phase 5 - Check Cyl groups
> >     26838 files, 2769562 used, 98993405 free (381 frags, 12374128 blocks, 
> > 0.0% fragmentation)
> > 
> > 
> > I had to crash the server at least 5 times to get an access to the
> > console. Else, all I was able to do was a hard reboot.
> > 
> > Here the ddb output after a new crash.
> > 
> >     /var: got error 5 while accessing filesystem
> >     panic: softdep_deallocate_dependencies: unrecovered I/O error
> 
> "unrecovered I/O error" sometimes happens due to a bad sector. Use amd64
> to fill the SSD with zeroes using dd and use also nick's suggestion from
> this thread:
> 
> http://openbsd-archive.7691.n7.nabble.com/ahci-error-during-install-of-6-4-td357865.html
> 
> The dmesg will show if you have bad sectors or not.
> 

I dis this (boy, that's very slow). At first it seems to solve the
problem, but I had a new crash this morrning when transferring big files.


> 
> 
> >     Stopped at      panic+0x154:        TID    PID    UID     PRFLAGS     
> > PFLAGS  C
> >     PU  COMMAND
> >      161990  94125      0     0x14000      0x200    0  zerothread
> >     *365345   9753      0    c+0x150
> >     panic() at brelse+0xc4
> >     brelse() at sd_buf_done+0x124
> 
> You're mounting the partitions from fstab, at the start of the init
> process. The reason why you only can boot with softdep is because
> softdep defers the write of the problematic blocks. sd_buf_done() is
> part of softdep and brelse() needs to write to the disk.
> 

I can boot without softdep.
I can't see the kernel panic and ddb> command line on console at every
crash, which make gathering relevant information more difficult.


After the last crash, I tried to copy big files on another hard drive
(**not SSD**). I had the same issue.
I wonder if it's not the ehci who has a bug. In the NetBSD manpage [1], you
can read the following, which make me think of what happens whith my
disck under OpenBSD  : 

> BUGS
> The support for hubs that are connected with high speed upstream and low or 
> full speed downstream (i.e., for transaction translators) is limited.

Maybe it's just a coincidence.

I'm thinking about this because just before the crash, I can see there
messages (sometimes, but as I can't access the console as I should) : 

        ehci_sync_hc: tsleep() = 35
        ehci_sync_hc: tsleep() = 35
        ehci_sync_hc: tsleep() = 35

Regards.

[1] https://man.openbsd.org/NetBSD-7.0.1/ehci.4


Reply via email to