I cannot recall a time where manipulating a pool during a scrub Or resilver did not restart the scrub/resilver operation.
J. Sent from my iPhone > On Jul 9, 2018, at 2:39 PM, Ken Merry <[email protected]> wrote: > > Hi ZFS folks, > > We (Spectra Logic) have seen some odd behavior with resilvers in RAIDZ3 pools. > > The codebase in question is FreeBSD stable/11 from July 2017, at > approximately FreeBSD SVN version 321310. > > We have customer systems with (sometimes) hundreds of SMR drives in RAIDZ3 > vdevs in a large pool. (A typical arrangement is a 23-drive RAIDZ3, and some > customers will put everything in one giant pool made up of a number of > 23-drive RAIDZ3 arrays.) > > The SMR drives in question have a bug that sometimes causes them to go off > the SAS bus for up to two minutes. (They’re usually gone a lot less than > that, up to 10 seconds.) Once they come back online, zfsd puts the drive > back in the pool and makes it online. > > If a resilver is active on a different drive, once the drive that temporarily > left comes back, the resilver apparently starts over from the beginning. > > This leads to resilvers that take forever to complete, especially on systems > with high load. > > Is this expected behavior? > > It seems that only one scan can be active on a pool at any given time. Is > that correct? If so, is that true for an entire top level pool, or just a > given redundancy group? (In this case, it would be the RAIDZ3 vdev.) > > Is there anything we can do to make sure the resilvers complete in a > reasonable period of time or otherwise improve the behavior? (Short of > putting in different drives…I have already suggested that.) > > Thanks, > > Ken > — > Ken Merry > [email protected] > ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T2a7340f4c0c48fa9-Mfe8e997e237a00c279d71233 Delivery options: https://openzfs.topicbox.com/groups
