If you needed to preserve the "wackiness" of the original file and pathnames 
(and I'm assuming you need to preserve the pathnames in order to avoid 
collisions between migrated files from different directories which have the 
same basename, and to allow the files to found/recovered again later, etc) then 
you can use Marc's `mmfind` suggestion, coupled with the -print0 argument to 
produce a null-delimited file list which could be coupled with an "xargs -0" 
pipeline or "rsync -0" to do most of the work. 

Test everything with a "dry-run" mode which reported what it would do, but 
without doing it, and one which copied without deleting, to help expose bugs in 
the process before destroying your data. If the migration doesn't cross between 
independent filesets, then file migrations could be performed using "mv" 
without any actual data copying.  (For that matter, it could also be done in 
two stages by hard-linking, then unlinking.)

But I think that there are other potential problems involved, even before 
considering things like path escaping or fileset boundaries...

If everything is predicated on the age of a file, you will need to create the 
missing directory hierarchy in the target dir structure for files which need to 
be "migrated".  If files in a directory vary in age, you may move some files 
but leave others alone (until they become old enough to migrate) creating 
incomplete and probably unusable versions at both the source and target.  What 
if a user recreates the missing files as they disappear?  As they later age, do 
you overwrite the files on the target?  What if a directory name is later 
changed to a filename or vice-versa? Will you ever need to "restore" these 
structures? If so, will you merge these back in to the original source if both 
non-empty source and target dirs exist?  Should we wait for an entire dir 
hierarchy to age out and then archive it atomically?  (We would want a way to 
know where project dir boundaries are.) 

I would urge you to think about how complex this might actually get before 
start performing surgery within data sets.  I would be inclined to challenge 
the original requirements to ensure that what you are able to accomplish 
matches up with the real goals without creating a raft of new operational 
problems or loss of work product.  Depending on the original goal, it may be 
possible to do this (more safely) with snapshots or tarballs.

-Paul

-----Original Message-----
From: [email protected] 
<[email protected]> On Behalf Of Jonathan Buzzard
Sent: Saturday, December 28, 2019 10:17 AM
To: [email protected]
Subject: Re: [gpfsug-discuss] Question about Policies

This message was sent by an external party.


On 27/12/2019 14:20, [email protected] wrote:
> You would want to look for examples of external scripts that work on 
> the result of running the policy engine in listing mode.  The one 
> issue that might need some attention is the way that gpfs quotes 
> unprintable characters in the pathname. So the policy engine generates 
> the list and your external script does the moving.
>

In my experience a good starting point would be to scan the list of files from 
the policy engine and separate the files out into "normal"; that is files using 
basic ASCII and no special characters and the rest also known as the "wacky 
pile".

Given that you are UK based it is not unreasonable to expect all path and file 
names to be in English. There might (and if not probably
should) be an institutional policy mandating it. Not much use if a researcher 
saves everything in Greek then gets knocked over by a bus and person picking up 
the work is Spanish for example.

Hopefully the "wacky pile" is small, however expect to find all sorts of 
bizarre file and path names in it. We are talking wildcards, back ticks, even 
newline characters to name but a few.

Depending on the amount of data in the "wacky" pile you might just want to 
forget about moving them, as they are orders of magnitude more difficult to 
deal with than files with "sane" path and file names and can rapidly soak up 
large chunks of time trying to deal with them in scripts.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG 
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to