On 02/22/2017 12:24 PM, Sanka Coffie wrote:
Then I did a clean reboot and ran fsck manually which produced all of the output below. The rest of this e-mail is just output from fsck. # fsck ** /dev/sd0a (402af328685601ff.a) (NO WRITE) ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 1747 files, 22931 used, 493332 free (84 frags, 61656 blocks, 0.0% fragmentation) ** /dev/sd1a (6ad7e2b138d393b2.a) (NO WRITE) ** File system is clean; not checking ** /dev/sd1e (6ad7e2b138d393b2.e) (NO WRITE) ** File system is clean; not checking ** /dev/sd1d (6ad7e2b138d393b2.d) (NO WRITE) ** Last Mounted on /tmp ** Phase 1 - Check Blocks and Sizes CANNOT READ: BLK 2016 CONTINUE? [Fyn?] y ahci0: ncq error: 6 0 41 84 ahci0: NCQ errored slot 6 is idle (00000004 active)
So, no matter what's happening, it always says command slot 6 was the one that failed. If you run fsck again, does it fail at the same block numbers, or are they more or less random?
I wonder if the ssd is misreporting its queue depth, so we shove too many commands at it and it doesn't know how to report that properly.
What output do you get with this diff: https://mild.embarrassm.net/~jonathan/t/atascsi-qdepth.diff (apply in src/sys/dev/ata)?
