Some follow-up to the discussion I kicked off a few days ago. Using simple GPFS replication on two sites looked like a good option, until you consider it’s really RAID5, and if the replica copy of the data fails during the restripe, you lose data. It’s not as bad as RAID5 because the data blocks for a file are spread across multiple servers versus reconstruction of a single array.
Raid 6 + Metadata replication isn’t a bad option but you are vulnerable to server failure. It’s relatively low expansion factor makes it attractive. My personal recommendation is going to be use Raid 6 + Metadata Replication (use “unmountOnDiskFail=meta” option), keep a spare server around to minimize downtime if one fails. Array rebuild times will impact performance, but it’s the price of having integrity. Comments? Data Distribution Expansion Factor Data Availability (Disk Failure) Data Availability (Server Failure) Data Integrity Comments Raid 6 (6+2) + Metadata replication 1.25+ High Low High Single server or single LUN failure results in some data being unavailable. Single Drive failure - lower performance during array rebuild. 2 site replication (GPFS) 2 High High Low Similar to RAID 5 - vulnerable to multiple disk failures. Rebuild done via GPFS restripe. URE vulnerable during restripe, but data distribution may mitigate this. Raid 6 (6+2) + Full 2 site replication (GPFS) 2.5 High High High Protected against single server and double drive failures. Single Drive failure - lower performance during array rebuild. Full 3 site replication (GPFS) 3 High High High Similar to RAID 6. Protected against single server and double drive failures. Rebuild done via GPFS restripe. Bob Oesterlin Sr Principal Storage Engineer, Nuance
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
