I cannot recall a time where manipulating a pool during a scrub   Or resilver 
did not restart the scrub/resilver operation. 

J. 

Sent from my iPhone

> On Jul 9, 2018, at 2:39 PM, Ken Merry <[email protected]> wrote:
> 
> Hi ZFS folks,
> 
> We (Spectra Logic) have seen some odd behavior with resilvers in RAIDZ3 pools.
> 
> The codebase in question is FreeBSD stable/11 from July 2017, at 
> approximately FreeBSD SVN version 321310.
> 
> We have customer systems with (sometimes) hundreds of SMR drives in RAIDZ3 
> vdevs in a large pool.  (A typical arrangement is a 23-drive RAIDZ3, and some 
> customers will put everything in one giant pool made up of a number of 
> 23-drive RAIDZ3 arrays.)
> 
> The SMR drives in question have a bug that sometimes causes them to go off 
> the SAS bus for up to two minutes.  (They’re usually gone a lot less than 
> that, up to 10 seconds.)  Once they come back online, zfsd puts the drive 
> back in the pool and makes it online.
> 
> If a resilver is active on a different drive, once the drive that temporarily 
> left comes back, the resilver apparently starts over from the beginning.
> 
> This leads to resilvers that take forever to complete, especially on systems 
> with high load.
> 
> Is this expected behavior?
> 
> It seems that only one scan can be active on a pool at any given time.  Is 
> that correct?  If so, is that true for an entire top level pool, or just a 
> given redundancy group?  (In this case, it would be the RAIDZ3 vdev.)
> 
> Is there anything we can do to make sure the resilvers complete in a 
> reasonable period of time or otherwise improve the behavior?  (Short of 
> putting in different drives…I have already suggested that.)
> 
> Thanks,
> 
> Ken
>  —
> Ken Merry
> [email protected]
> 

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T2a7340f4c0c48fa9-Mfe8e997e237a00c279d71233
Delivery options: https://openzfs.topicbox.com/groups

Reply via email to