I don't know enough about ZFS internals to say how resilvers work but have some personal experience relevant to this issue. Due to a power supply problem I was having drives go offline. It would come right back when onlined and start to resilver, By the time I found out what the problem was and fixed it, I had three drives of a raidz2 vdev all resilvering at once. I thought this was a little much so killed the processes using this pool and waited until the relivers completed before restarting them. No data loss during resilver or a subsequent scrub.
On Wed, Jul 11, 2018 at 10:16 AM, Kenneth D. Merry <[email protected]> wrote: > On Tue, Jul 10, 2018 at 23:30:06 +0100, Andrew Gabriel wrote: > > On 10/07/2018 09:26, Pawel Jakub Dawidek wrote: > > > On 7/9/18 23:39, Ken Merry wrote: > > >> Hi ZFS folks, > > >> > > >> We (Spectra Logic) have seen some odd behavior with resilvers in > RAIDZ3 pools. > > >> > > >> The codebase in question is FreeBSD stable/11 from July 2017, at > approximately FreeBSD SVN version 321310. > > >> > > >> We have customer systems with (sometimes) hundreds of SMR drives in > RAIDZ3 vdevs in a large pool. (A typical arrangement is a 23-drive RAIDZ3, > and some customers will put everything in one giant pool made up of a > number of 23-drive RAIDZ3 arrays.) > > >> > > >> The SMR drives in question have a bug that sometimes causes them to > go off the SAS bus for up to two minutes. (They???re usually gone a lot > less than that, up to 10 seconds.) Once they come back online, zfsd puts > the drive back in the pool and makes it online. > > >> > > >> If a resilver is active on a different drive, once the drive that > temporarily left comes back, the resilver apparently starts over from the > beginning. > > >> > > >> This leads to resilvers that take forever to complete, especially on > systems with high load. > > > Since resilver is single threaded, adding the drive immediately doesn't > > > buy you any additional redundancy. Maybe it would make sense for the > > > zfsd to delay reinserting the drive until after ongoing resilver is > done? > > > > A zpool can only have one resilver running, but it resilvers all the > > disks requiring resilver together. > > So yes, it starts again when an additional drive returns to the zpool, > > but then does them all in parallel. > > Ok. Does it go back to the beginning, or to the lowest common denominator > transaction group in that case? > > > Doing them serially would take longer which won't help if your drive > > failure rate is already bumping up against your drive resilver > > completion rate. It would also mean it doesn't have data from the later > > drive available to reconstruct data on the original resilvering drive. > > So if you online a drive, can ZFS access data from the onlined drive before > the resilver gets to that point? > > > You need to get your drive firmware fixed - it's plain broken. > > The vendor released a firmware update that helped a bit, but has refused > to do any more for the drive model in question. They have a newer model > of the same drive (which we're shipping now) that is far less likely (10% > of the rate of the older drive) to exhibit the problem. But that doesn't > fix the customers with the older drive already out in the field. > > Yes, I agree, the drive is broken. > > Ken > -- > Kenneth Merry > [email protected] ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T2a7340f4c0c48fa9-Mec2d71febcd73c25296b37d8 Delivery options: https://openzfs.topicbox.com/groups
