Hi All,
Something I am thinking about doing is utilising the placement policy engine to
insert custom metadata tags upon file creation, based on which fileset the
creation occurs in. This might be to facilitate Research Data Management tasks
that could happen later in the data lifecycle.
I am also thinking about allowing users to specify additional custom metadata
tags (maybe through a fancy web interface) and also potentially give users
control over creating new filesets (e.g. for scientists running new
experiments). So… pretend this is a placement policy on my GPFS driven
data-ingest platform:
RULE 'RDMTEST'
SET POOL 'instruments’
FOR FILESET
('%GPFSRDM%10.01013%RDM%0ab34906-5357-4ca0-9d19-a470943db30a%RDM%8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')
WHERE SetXattr
('user.rdm.parent','0ab34906-5357-4ca0-9d19-a470943db30a')
AND SetXattr
('user.rdm.ingestor','8fc2395d-64c0-4ebd-8c71-0d2d34b3c1c0')
RULE 'DEFAULT' SET POOL 'data'
The fileset name can be meaningless (as far as the user is concerned), but
would be linked somewhere nice that they recognise – say
/gpfs/incoming/instrument1. The fileset, when it is created, would also be an
AFM cache for its ‘home’ counterpart which exists on a much larger (also GPFS
driven) pool of storage… so that my metadata tags are preserved, you see.
This potentially user driven activity might look a bit like this:
- User logs in to web interface and creates new experiment
- Filesets (system-generated names) are created on ‘home’ and ‘ingest’
file systems and linked into the directory namespace wherever the user specifies
- AFM relationships are set up and established for the ingest (cache)
fileset to write back to the AFM home fileset (probably Independent Writer mode)
- A set of ‘default’ policies are defined and installed on the cache
file system to tag data for that experiment (the user can’t change these)
- The user now specifies additional metadata tags they want added to
their experiment data (some of this might be captured through additional
mandatory fields in the web form for instance)
- A policy for later execution by mmapplypolicy on the AFM home file
system is created which looks for the tags generated at ingest-time and applies
the extra user-defined tags
There’s much more that would go on later in the lifecycle to take care of
automated HSM tiering, data publishing, movement and cataloguing of data onto
external non GPFS file systems, etc. but I won’t go in to it here. My GPFS
related questions are:
When I install a placement policy into the file system, does the file system
need to quiesce? My suspicion is yes, because the policy needs to be consistent
on all nodes performing I/O, but I may be wrong.
What is the specific limitation for having a policy placement file no larger
than 1MB?
Cheers,
Luke.
Luke Raimbach
Senior HPC Data and Storage Systems Engineer
The Francis Crick Institute
Gibbs Building
215 Euston Road
London NW1 2BE
E: [email protected]<mailto:[email protected]>
W: www.crick.ac.uk<http://www.crick.ac.uk/>
The Francis Crick Institute Limited is a registered charity in England and
Wales no. 1140062 and a company registered in England and Wales no. 06885462,
with its registered office at 215 Euston Road, London NW1 2BE.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss