On Tue, Jul 10, 2018 at 10:26:52 +0200, Pawel Jakub Dawidek wrote:
> On 7/9/18 23:39, Ken Merry wrote:
> > Hi ZFS folks,
> > 
> > We (Spectra Logic) have seen some odd behavior with resilvers in RAIDZ3 
> > pools.
> > 
> > The codebase in question is FreeBSD stable/11 from July 2017, at 
> > approximately FreeBSD SVN version 321310.
> > 
> > We have customer systems with (sometimes) hundreds of SMR drives in RAIDZ3 
> > vdevs in a large pool.  (A typical arrangement is a 23-drive RAIDZ3, and 
> > some customers will put everything in one giant pool made up of a number of 
> > 23-drive RAIDZ3 arrays.)
> > 
> > The SMR drives in question have a bug that sometimes causes them to go off 
> > the SAS bus for up to two minutes.  (They???re usually gone a lot less than 
> > that, up to 10 seconds.)  Once they come back online, zfsd puts the drive 
> > back in the pool and makes it online.
> > 
> > If a resilver is active on a different drive, once the drive that 
> > temporarily left comes back, the resilver apparently starts over from the 
> > beginning.
> > 
> > This leads to resilvers that take forever to complete, especially on 
> > systems with high load.
> 
> Since resilver is single threaded, adding the drive immediately doesn't
> buy you any additional redundancy. Maybe it would make sense for the
> zfsd to delay reinserting the drive until after ongoing resilver is done?

If adding a drive immediately doesn't get you any additional redundancy...

That implies that ZFS isn't able to use the data on the re-inserted drive
at all until the first resilver finishes.

Consider the following scenario:

1.  Drive A fails in a RAIDZ3 pool, and a resilver starts onto drive X.
2.  When the resilver onto drive X is 50% done, drives B, C, and D drop out
    and come back (and are onlined) a few seconds later.
3.  A read comes in that references data that is past the 50% mark on drive
    X but the data in question is contained on drives B, C and D.  (In
    other words, the data in question didn't get written in the few seconds
    that they were offline.)

If what you are saying is correct, ZFS would not be able to complete the
read, even though the data is there and available.  The resilver would also
stall, since it would require at least one of drives B, C or D to continue.

If resilvers are single threaded, does that mean that there can only be one
resilver active on a pool, even if it contains hundreds of drives and lots of
separate redundant vdevs?

Thanks,

Ken
-- 
Kenneth Merry
k...@freebsd.org

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T2a7340f4c0c48fa9-Me1396a5219b5f64091692854
Delivery options: https://openzfs.topicbox.com/groups

Reply via email to