Re: [gpfsug-discuss] Introduction

Luke Raimbach Mon, 02 Mar 2015 11:31:35 -0800

HI Adam,

We run virtualised GPFS client nodes in a VMware cluster here at Oxford 
e-Research Centre. We had a requirement where one research group wanted root 
access to their VMs but also wanted fast direct access to their data on our 
GPFS cluster.


The technical setup was (relatively) simple. We span up a small three-node 
virtual GPFS cluster (with no file system of its own). We then used the 
multi-cluster feature of GPFS to allow this small virtual cluster to join our 
main file system cluster which now gives that group very good IO performance.

However, the problem of spoofing you mention is relevant – we installed the 
virtual cluster nodes for the research group and put our Active Directory 
client on them. We also used the root-squash configuration option of the 
multi-cluster setup to prevent remote-cluster root access in to our file 
system. We also agreed with the research group that they would nominate one 
Administrator to have root access in their cluster and that they would maintain 
the AAA framework we put in place. We have to trust the group’s Administrator 
not to fiddle his UID or to let others escalate their privileges.

If you were letting untrusted root users spin up Stacks, then you could still 
run GPFS clients in the OpenStack instance nodes to give them fast access to 
their data. Here are some musings on a recipe (others please feel free to pull 
these ideas to pieces):


1.       Start with Cluster A – your main production GPFS file system. It has 
GPFS device name /gpfs.

2.       Pretend you have lots of money for extra disk to go under your 
OpenStack cluster (say you buy something like a DDN SFA7700 with a couple of 
expansion shelves and fill it up with 4TB drives – 180 drives).

3.       Drive this disk array with two, preferably four (or however many you 
want, really) decent NSD servers. Configure quorum nodes, etc. appropriately. 
Call this Cluster B.

4.       Carve up the disk array in to something like 30 x RAID6 (4 + 2) LUNs 
and configure them as GPFS NSDs; but don’t create a file system (line up the 
stripe sizes etc. and choose a nice block size, etc. etc.)…

5.       Put the GPFS metadata on some SSD NSDs somewhere. I like putting it on 
SSDs in the NSD server nodes and replicating it. Other people like putting it 
in their disk arrays.

6.       As part of someone spinning up a Stack, get some scripts to do the 
following “magic”:

a.       Connect to Cluster A and find out how big their target data-set is.

b.      Connect to Cluster B and create a new GPFS file system with a 
reasonable (dependent on the above result) number of NSD disks. Call this new 
GPFS device something unique other that /gpfs e.g. /gpfs0001. You could slice 
bits off your SSDs for the metadata NSDs in each file system you create in this 
manner (if you haven’t got many SSDs).

c.       As part of a new Stack, provide a few (three, say) GPFS quorum nodes 
that you’ve configured. Call this Cluster C. Add the rest of the stack 
instances to Cluster C. No File System.

d.      Pop back over to Cluster A. Export their target data-set from Cluster A 
using AFM (over GPFS or NFS – pick your favourite: GPFS is probably better 
performant but means you need Cluster A to stay online).

e.      Now return to Cluster B. Import the target data to a local AFM cache on 
Cluster B’s new file system. Name the AFM file-set whatever you like, but link 
it in to the Cluster B /gpfs0001 namespace at the same level as it is in 
Cluster A. For example Cluster A: /gpfs/projects/dataset01 imports to an AFM 
fileset in Cluster B named userdataset01. Link this under 
/gpfs0001/projects/dataset01.

f.        Configure multi-cluster support on Cluster B to export GPFS device 
/gpfs0001 to Cluster C. Encrypt traffic if you want a headache.

g.       Configure multi-cluster support on Cluster C to remote mount Cluster 
B:/gpfs0001 as local device /gpfs.

7.       You now have fast GPFS access to this user dataset *only* using GPFS 
clients inside the OpenStack instance nodes. You have also preserved the file 
system namespace in Cluster C’s instance nodes. If you only want to run over 
the data in the stack instances, you could pre-fetch the entire data-set using 
AFM Control from Cluster A in to the Cluster B file-set (if it’s big enough).

8.       Now your users are finished and want to destroy the stack – you need 
some more script “magic”:

a.       Dismount the file system /gpfs in Cluster C.

b.      Connect to Cluster B and use AFM control to flush all the data back 
home to Cluster A.

c.       Unlink the file-set in Cluster B and force delete it; then delete the 
file system to free the NSDs back to the pool available to Cluster B.

d.      Connect back to Cluster A and unexport the original data-set directory 
structure.

e.      Throw away the VMs in the stack

Things to worry about:

·         Inode space will be different if users happen to be working on the 
data in Cluster A and Cluster C and want to know about inodes. GPFS XATTRS are 
preserved.

·         If you use AFM over NFS because Cluster A and B are far away from 
each other and laggy, then there’s no locking with your AFM cache running as an 
Independent Writer. Writes at home (Cluster A) and in cache (Cluster B from 
Cluster C) will be nondeterministic. Your users will need to know this to avoid 
disappointment.

·         If you use AFM over GPFS because Cluster A and B are near to each 
other and have a fast network, then there might still not be locking, but if 
Cluster A goes offline, it will put the AFM cache in to a “dismounted” state.

·         If your users want access to other parts of the Cluster A /gpfs 
namespace within their stack instances (because you have tools they use or they 
want to see other stuff), you can export them as read-only to a read-only AFM 
cache in Cluster B and they will be able to see things in Cluster C provided 
you link the AFM caches in the right places. Remember they have root access 
here.

·         AFM updates are sent from cache to home as root so users can 
potentially overflow their quota at the Cluster A home site (root doesn’t care 
about quotas at home).

·         Other frightening things might happen that I’ve not thought about.


Hope this helps!
Luke

--

Luke Raimbach
IT Manager
Oxford e-Research Centre
7 Keble Road,
Oxford,
OX1 3QG

+44(0)1865 610639

From: [email protected] 
[mailto:[email protected]] On Behalf Of Adam Huffman
Sent: 02 March 2015 09:40
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Introduction

Hi Vic,

Re-emphasising that I’m still very much learning about GPFS, one of the 
approaches being discussed is running the GPFS client inside the instances. The 
concern here is over the case where users have root privileges inside their 
instances (a pretty common assumption for those used to AWS, for example) and 
the implications this may have for GPFS. Does it mean there would be a risk of 
spoofing?

Cheers,
Adam

From: Vic Cornell
Reply-To: gpfsug main discussion list
Date: Monday, 2 March 2015 09:32
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Introduction

Hi Adam,

I guess that one of the things that would help push it forward is a definition 
of what "secure" means to you.

Regards,

Vic


On 2 Mar 2015, at 09:24, Adam Huffman 
<[email protected]<mailto:[email protected]>> wrote:


Hello

A couple of weeks ago I joined Bruno Silva’s HPC team at the Francis Crick 
Institute, with special responsibility for HPC, OpenStack and virtualization. 
I’m very much a GPFS novice so I’m hoping to be able to draw on the knowledge 
in this group, while hopefully being able to help others with OpenStack.

As Bruno stated in his message, we’re particularly interested in how to present 
GPFS to instances securely. I’ve read the discussion from November on this 
list, which didn’t seem to come to any firm conclusions. Has anyone involved 
then made substantial progress since?

Cheers,
Adam

—

Adam Huffman
Senior HPC & Virtualization Systems Engineer
The Francis Crick Institute
Gibbs Building
215 Euston Road
London NW1 2BE

T:
E: [email protected]<mailto:[email protected]>
W: www.crick.ac.uk<http://www.crick.ac.uk/>
<158254D2-CDA8-43E5-96C1-A0AAEB10314B.png><http://www.facebook.com/thefranciscrickinstitute><B09B8AB7-5CA2-4DE7-BFE7-9D8F1B2A5166.png><http://www.twitter.com/@thecrick>

The Francis Crick Institute Limited is a registered charity in England and 
Wales no. 1140062 and a company registered in England and Wales no.06885462, 
with its registered office at 215 Euston Road, London NW1 2BE


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Introduction

Reply via email to