On 2016-08-04 07:22, Ultima wrote:
> Hello,
> I recently had some issue with a PSU and ran several scrubs on a pool with
> around 35T. Random drives would drop and require a zpool online, this found
> checksum errors. (as expected) However, after all the scrubs I ran, I think
> I may have found a bug with zpool online resilvering process.
> 24 disks total, 4 vdevs raidz2 (6 drives each).
> Before this next part... I had a backup PSU, however it was also going bad
> and waiting for RMA. The current one seemed to be dieing but ran fine with
> less drives. So I decided I would run the server short 4 drives.
> Started by offline(or already removed from psu) 4 drives from different
> vdevs, then ran a scrub to verify everything. Many sum errors were present
> on some of the drives, but this was expected due to faulty psu. Then
> offlined 4 different drives and onlined the other 4 and scrubbed once
> again. After resilver, again, many sum errors on these drives as expected.
> After the scrub completed, I decided to offline 4 different drives, then
> online the ones that were out of pool for awhile. During the resilver,
> checksum errors were once again found. I was surprised due to the recent
> scrub, So I decided to run another scrub, and it found even more checksum
> errors on these recently onlined drives. I didn't think much about it,
> however after the replacement PSU arrived, I onlined all the drives out of
> pool and again, resilver had checksum errors as well as another scrub with
> more sum errors.
> Is this issue known? Is it common for a scrub to be required after onlining
> a disk that was out of pool for some time?
> The drives are ST4000NM0033, and until recent have never had a single
> checksum error in they're lifetime.(at least with zfs)
> FreeBSD S1 12.0-CURRENT FreeBSD 12.0-CURRENT #19 r303224: Sat Jul 23
> 10:41:12 EDT 2016
> root@S1:/usr/src/head/obj/usr/src/head/src/sys/MYKERNEL-NODEBUG
>  amd64
> Sorry for the wall of text, but I hope this helps in tracking down this
> possible bug.

Perhaps on or more of the drives running out of Realloc Sectors?
I had once a case where smartctl showed no issues but zfs scrubbing showed
a defect, some weeks later smartctl was showing some reallocated sectors
and one week later the HD was out of spare sectors.

Have you already tested every single HD for smart issues?

freebsd-current@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to