Hi All,

Wow - my query got more responses than I expected and my sincere thanks to all 
who took the time to respond!

At this point in time we do have two GPFS filesystems … one which is basically 
“/home” and some software installations and the other which is “/scratch” and 
“/data” (former backed up, latter not).  Both of them have their metadata on 
SSDs set up as RAID 1 mirrors and replication set to two.  But at this point in 
time all of the SSDs are in a single storage array (albeit with dual redundant 
controllers) … so the storage array itself is my only SPOF.

As part of the hardware purchase we are in the process of making we will be 
buying a 2nd storage array that can house 2.5” SSDs.  Therefore, we will be 
splitting our SSDs between chassis and eliminating that last SPOF.  Of course, 
this includes the new SSDs we are getting for our new /home filesystem.

Our plan right now is to buy 10 SSDs, which will allow us to test 3 
configurations:

1) two 4+1P RAID 5 LUNs split up into a total of 8 LV’s (with each of my 8 NSD 
servers as primary for one of those LV’s and the other 7 as backups) and GPFS 
metadata replication set to 2.

2) four RAID 1 mirrors (which obviously leaves 2 SSDs unused) and GPFS metadata 
replication set to 2.  This would mean that only 4 of my 8 NSD servers would be 
a primary.

3) nine RAID 0 / bare drives with GPFS metadata replication set to 3 (which 
leaves 1 SSD unused).  All 8 NSD servers primary for one SSD and 1 serving up 
two.

The responses I received concerning RAID 5 and performance were not a surprise 
to me.  The main advantage that option gives is the most usable storage space 
for the money (in fact, it gives us way more storage space than we currently 
need) … but if it tanks performance, then that’s a deal breaker.

Personally, I like the four RAID 1 mirrors config like we’ve been using for 
years, but it has the disadvantage of giving us the least usable storage space 
… that config would give us the minimum we need for right now, but doesn’t 
really allow for much future growth.

I have no experience with metadata replication of 3 (but had actually thought 
of that option, so feel good that others suggested it) so option 3 will be a 
brand new experience for us.  It is the most optimal in terms of meeting 
current needs plus allowing for future growth without giving us way more space 
than we are likely to need).  I will be curious to see how long it takes GPFS 
to re-replicate the data when we simulate a drive failure as opposed to how 
long a RAID rebuild takes.

I am a big believer in Murphy’s Law (Sunday I paid off a bill, Wednesday my 
refrigerator died!) … and also believe that the definition of a pessimist is 
“someone with experience” <grin> … so we will definitely not set GPFS metadata 
replication to less than two, nor will we use non-Enterprise class SSDs for 
metadata … but I do still appreciate the suggestions.

If there is interest, I will report back on our findings.  If anyone has any 
additional thoughts or suggestions, I’d also appreciate hearing them.  Again, 
thank you!

Kevin

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to