Also, ignoring the resilver speed issue, if you have one big raidz2 device, you're basically guaranteed to lose data eventually. Lose a controller or just 3 disks (most of my disk failures happened while recovering from another failure for example) and you are probably toast.
On the 4500s we had been putting the os on a cf card and did 4x11+4, but os on raid+4x11+2 also works. On Jul 7, 2012 9:13 AM, "Edward Ned Harvey" <[email protected]> wrote: > > From: Peter Baer Galvin [mailto:[email protected]] > > Sent: Friday, July 06, 2012 10:50 AM > > > > Hmm, resilvering performance has greatly increased over time Ned. With > > which > > version of ZFS did you have the never-completing problem? > > I haven't had the problem myself, because I know enough to avoid it. I > participate a lot in the zfs-discuss mailing list (which was formerly > extremely active including zfs developers, but now it's mostly just other > IT > people offering advice to each other, since the oracle takeover.) > > The root cause of the problem is like this: > > In a zfs resilver, they decided to be clever. By comparison to a hardware > raid resilver which must resilver the entire disk, including unused blocks, > a ZFS resilver only resilvers the used blocks. Theoretically this should > make resilvering very fast, right? Unfortunately, no. Because the > hardware > resilver sequentially does each block of the whole disk, it's easy to > calculate the whole-disk resilver time as the total disk capacity divided > by > the sustained sequential speed of the drive. Something on the order of 2 > hours depending on your drive. But in zfs, they don't have any way to > *sort* the used blocks into disk sequential order. The resilver ordering > is > approximated by temporal order. And, assuming you have a mostly full pool > (>50%), that's been in production for a while, reading & writing, creating > & > destroying snapshots, it means temporal order is approximated by random > order. So zfs resilvering is approximated by random IO for all your used > blocks. This is very much dependent on your individual specific usage > patterns. > > Resilvering is a per-vdev operation. If we assume the size of the pool & > the size of the data are given by design & budget constraints, and you are > faced with the decision to organize your pool into a big raidz versus > divide > your pool up into a bunch of mirrors, it means you have less data in each > mirror to resilver. Naturally, for equal usable capacity, mirrors cost > more. For the sake of illustrating my point I've assumed you're able to > consider a big raidz versus an equivalently sized (higher cost) bunch of > mirrors. The principal holds even if you scale up or scale down... If you > have a set amount of data, divided by a configurable number of vdev's, you > will have less to resilver if you choose to have more vdev's. > > Also, random IO for a raidzN (or raid5 or raid6 or raid-DP) is approximated > by the worst case access time for any individual disk (approx 2x slower > than > the average access time for a single disk). Meanwhile, random IO for a > mirror is approximated by the average access time for an individual disk. > > So if you break up your pool into a bunch of mirrors rather than a large > raidzN, you have both a faster ability to perform random IO (factor of 2x), > and less random IO that needs to be done (factor of Mx, where M is how many > times smaller the mirror is compared to the raidz. If you obey the rule of > thumb "limit raidz to 8-10 disks per vdev," then Mx is something like > factor > of 8x). End result is factor of ~16x faster using mirrors instead of raid. > > So in rough numbers, a 46-disk raidz2 (capacity of 44 disks) will be > approximately 88 times slower to resilver than a bunch of mirrors. > > In systems that I support, I only deploy mirrors. When I have a resilver, > I > expect it to take 12 hours. By comparison, if this were a hardware raid, > it > would resilver in 2 hours... And if it were one big raidz, it would > resilver in approx 6 weeks. > > _______________________________________________ > bblisa mailing list > [email protected] > http://www.bblisa.org/mailman/listinfo/bblisa >
_______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
