Re: [gpfsug-discuss] RAID type for system pool

Bryan Banister Wed, 05 Sep 2018 10:33:46 -0700

I agree with Anderson on his thoughts, mainly that if you want to go with RAID5 
then you should analyze your current workload to see if it is mostly read 
operations or if you have more of a heavy write situation.  Read-modify-write 
penalties and write amplification wearing problems on SSDs will become an issue 
for performance and life of the SSDs if you have a heavy metadata write 
workload.  This also applies to the data in inode situation.  The current 
workload can be inspected with standard iostat, mmdiag --iohist, mmpmon, and 
the GPFS perfmon stuff.


We have SSDs in both RAID1 (metadata) and RAID5 configurations (data).  We’re 
using the RAID controllers to split up the RAID sets into multiple virtual 
volumes so that we can have more NSD servers hosting the storage and increase 
the number of I/O commands (aka queue depth x N LUNs > queue depth x 1 LUN) 
being sent to the storage.  Since there isn’t a seek penalty this is working 
well for us.

As mentioned below, be sure to round-robin the ServerList for the NSDs to 
spread the load across servers.

Hope that helps!
-Bryan

From: [email protected] 
<[email protected]> On Behalf Of Anderson Ferreira Nobre
Sent: Wednesday, September 5, 2018 11:51 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [gpfsug-discuss] RAID type for system pool

Note: External Email
________________________________
Hi Kevin,

RAID5 is good when the read ratio of I/Os is 70% or more. About creating two 
RAID5 you need to consider the size of the disks and the time to rebuild the 
RAID in case of failure. Maybe a single RAID5 would be better because you have 
more disks working in the backend for a single RAID. I think since if you are 
using SSD disks the time to rebuild the RAID will always be fast. So you 
wouldn't need a RAID6. Maybe it's a good idea to read the manual of SAS RAID 
controller to see how long takes to rebuild the RAID in case of a failure.
About the stripe size of controller vs block size in GPFS. This is just a 
guess, and you would need to do some performance test to make sure. You could 
consider the stripe width of RAID to be the block size of metadata. I think 
this is the best you can do.
Break in several LUNs I consider a good idea for you don't have large queue 
length in the LUNs. Specially if the I/O profile is many I/O with small block 
size.
About balance the LUNs over the NSD Servers is a best practice. Do not leave 
all the LUNs pointing to the first node. Just remember that when you create the 
NSDs, the device is always corresponding to the first node of servers. This can 
be laborous work. So to make the things easier I create two NSD stanza files. 
The first one pointing to the first node like this:
%nsd device=/dev/mapper/mpatha
    nsd=nsd001
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

%nsd device=/dev/mapper/mpathb
    nsd=nsd002
    servers=
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

Then I use this stanza file to create the nsds. And create a second stanza file:
%nsd
    nsd=nsd001
    servers=host1,host2,host3,host4
    usage=metadataOnly
    failureGroup=1
    pool=system

%nsd
    nsd=nsd002
    servers=host2,host3,host4,host1
    usage=metadataOnly
    failureGroup=1
    pool=system

And change with mmchnsd.

Abraços / Regards / Saludos,


Anderson Nobre
AIX & Power Consultant
Master Certified IT Specialist
IBM Systems Hardware Client Technical Team – IBM Systems Lab Services

[community_general_lab_services]




________________________________

Phone: 55-19-2132-4317
E-mail: [email protected]<mailto:[email protected]>

[IBM]



----- Original message -----
From: "Buterbaugh, Kevin L" 
<[email protected]<mailto:[email protected]>>
Sent by: 
[email protected]<mailto:[email protected]>
To: gpfsug main discussion list 
<[email protected]<mailto:[email protected]>>
Cc:
Subject: [gpfsug-discuss] RAID type for system pool
Date: Wed, Sep 5, 2018 12:35 PM

Hi All,

We are in the process of finalizing the purchase of some new storage arrays (so 
no sales people who might be monitoring this list need contact me) to 
life-cycle some older hardware.  One of the things we are considering is the 
purchase of some new SSD’s for our “/home” filesystem and I have a question or 
two related to that.

Currently, the existing home filesystem has it’s metadata on SSD’s … two RAID 1 
mirrors and metadata replication set to two.  However, the filesystem itself is 
old enough that it uses 512 byte inodes.  We have analyzed our users files and 
know that if we create a new filesystem with 4K inodes that a very significant 
portion of the files would now have their _data_ stored in the inode as well 
due to the files being 3.5K or smaller (currently all data is on spinning HD 
RAID 1 mirrors).

Of course, if we increase the size of the inodes by a factor of 8 then we also 
need 8 times as much space to store those inodes.  Given that Enterprise class 
SSDs are still very expensive and our budget is not unlimited, we’re trying to 
get the best bang for the buck.

We have always - even back in the day when our metadata was on spinning disk 
and not SSD - used RAID 1 mirrors and metadata replication of two.  However, we 
are wondering if it might be possible to switch to RAID 5?  Specifically, what 
we are considering doing is buying 8 new SSDs and creating two 3+1P RAID 5 LUNs 
(metadata replication would stay at two).  That would give us 50% more usable 
space than if we configured those same 8 drives as four RAID 1 mirrors.

Unfortunately, unless I’m misunderstanding something, mean that the RAID stripe 
size and the GPFS block size could not match.  Therefore, even though we don’t 
need the space, would we be much better off to buy 10 SSDs and create two 4+1P 
RAID 5 LUNs?

I’ve searched the mailing list archives and scanned the DeveloperWorks wiki and 
even glanced at the GPFS documentation and haven’t found anything that says 
“bad idea, Kevin”… ;-)

Expanding on this further … if we just present those two RAID 5 LUNs to GPFS as 
NSDs then we can only have two NSD servers as primary for them.  So another 
thing we’re considering is to take those RAID 5 LUNs and further sub-divide 
them into a total of 8 logical volumes, each of which could be a GPFS NSD and 
therefore would allow us to have each of our 8 NSD servers be primary for one 
of them.  Even worse idea?!?  Good idea?

Anybody have any better ideas???  ;-)

Oh, and currently we’re on GPFS 4.2.3-10, but are also planning on moving to 
GPFS 5.0.1-x before creating the new filesystem.

Thanks much…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
[email protected]<mailto:[email protected]> - 
(615)875-9633

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



________________________________

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential, or privileged information and/or 
personal data. If you are not the intended recipient, you are hereby notified 
that any review, dissemination, or copying of this email is strictly 
prohibited, and requested to notify the sender immediately and destroy this 
email and any attachments. Email transmission cannot be guaranteed to be secure 
or error-free. The Company, therefore, does not make any guarantees as to the 
completeness or accuracy of this email or any attachments. This email is for 
informational purposes only and does not constitute a recommendation, offer, 
request, or solicitation of any kind to buy, sell, subscribe, redeem, or 
perform any type of transaction of a financial product. Personal data, as 
defined by applicable data privacy laws, contained in this email may be 
processed by the Company, and any of its affiliated or related companies, for 
potential ongoing compliance and/or business-related purposes. You may have 
rights regarding your personal data; for information on exercising these rights 
or the Company’s treatment of personal data, please email 
[email protected].

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] RAID type for system pool

Reply via email to