2016-03-08 4:55 GMT+08:00 Liam Slusser <lslus...@gmail.com>: > I don't have a 2000 drive array (thats amazing!) but I do have two 280 > drive arrays which are in production. Here are the generic stats: > > server setup: > OpenIndiana oi_151 > 1 server rack > Dell r720xd 64g ram with mirrored 250g boot disks > 5 x LSI 9207-8e dualport SAS pci-e host bus adapters > Intel 10g fibre ethernet (dual port) > 2 x SSD for log cache > 2 x SSD for cache > 23 x Dell MD1200 with 3T,4T, or 6T NLSAS disks (a mix of Toshiba, Western > Digital, and Seagate drives - basically whatever Dell sends) > > zpool setup: > 23 x 12-disk raidz2 glued together. 276 total disks. Basically each new > 12 disk MD1200 is a new raidz2 added to the pool. > > Total size: ~797T > > We have an identical server which we replicate changes via zfs snapshots > every few minutes. The whole setup as been up and running for a few years > now, no issues. As we run low on space we purchase two additional MD1200 > shelfs (one for each system) and add the new raidz2 into pool on-the-fly. > > The only real issues we've had is sometimes a disk fails in such a way > (think Monty Python and the holy grail i'm not dead yet) where the disk > hasn't failed but is timing out and slows the whole array to a standstill > until we can manual find and remove the disk. Other problems are once a > disk has been replaced sometimes the resilver process can take > an eternity. We have also found the snapshot replication process can > interfere with the resilver process - resilver gets stuck at 99% and never > ends - so we end up stopping or only doing one replication a day until the > resilver process is done. > > The last helpful hint I have was lowering all the drive timeouts, see > http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ > for info. > > [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000 > drives were just in test. It is true that huge pools have lots of operation > challenges. I have met the similar sluggish issue caused by a > will-die disk. Just curious, what is the cluster software implemented in http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ ?
Thanks. Fred > > > >>> >>> >> > *illumos-zfs* | Archives > <https://www.listbox.com/member/archive/182191/=now> > <https://www.listbox.com/member/archive/rss/182191/22147814-d504851f> | > Modify > <https://www.listbox.com/member/?&> > Your Subscription <http://www.listbox.com> > ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com