Hi there,

I have a RAID 1 mirror implemented with gmirror and we recently had some power issues at our data centre which caused fsck to fail mysteriously. The server lost power unexpectedly, then came back up again for a minute, power died again and shortly after the next boot the following appears in my /var/log/messages

Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: INCORRECT BLOCK COUNT I=777684 (8 should be 0) (CORRECTED) Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: CANNOT READ BLK: 12417184 Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.

gm0s1f is my /usr partition. This was followed by countless errors that look like

Feb 2 05:20:38 myserver ad6: TIMEOUT - READ_DMA retrying (1 retry left) LBA=29096879 Feb 2 05:20:43 myserver ad6: TIMEOUT - READ_DMA retrying (0 retries left) LBA=29096879
   Feb  2 05:20:48 myserver ad6: FAILURE - READ_DMA timed out LBA=29096879
Feb 2 05:20:48 myserver g_vfs_done():mirror/gm0s1f[READ(offset=6357598208, length=16384)]error = 5

and with it went any sort of remote access to the box. We had to get physical access, fsck -y and reboot for the machine to be put back into service.

Now my question is: Why did fsck die on me? I thought in this day and age file system corruptions caused by power failures are repaired automatically upon reboot. Or is it possible that interrupting fsck itself caused the problem when the system went down again after the very brief uptime in between?

I am really concerned about this as this caused a lot of unnecessary downtime and I really don't want this to ever happen again. I know, solving the power issues is the real solution but I want my several layers of peace of mind.

Oh, I run 6.2 RELEASE.

freebsd-questions@freebsd.org mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to