Hi Jim,

If you never worked with policy rules before, you may want to start by building 
your nerves to it.

In the /usr/lpp/mmfs/samples/ilm path you will find several examples of 
templates that you can use to play around. I would start with the 'list' rules 
first.
Some of those templates are a bit complex, so here is one script that I use on 
a regular basis to detect files larger than 1MB (you can even exclude specific 
filesets):

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dss-mgt1:/scratch/r/root/mmpolicyRules # cat mmpolicyRules-list-large
/* A macro to abbreviate VARCHAR */
define([vc],[VARCHAR($1)])

/* Define three external lists */
RULE EXTERNAL LIST 'largefiles' EXEC 
'/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list'

/* Generate a list of all files that have more than 1MB of space allocated. */
RULE 'r2' LIST 'largefiles'
        SHOW('-u' vc(USER_ID) || ' -s' || vc(FILE_SIZE))
        /*FROM POOL 'system'*/
        FROM POOL 'data'
        /*FOR FILESET('root')*/
        WEIGHT(FILE_SIZE)
        WHERE KB_ALLOCATED > 1024

/* Files in special filesets, such as mmpolicyRules, are never moved or deleted 
*/
RULE 'ExcSpecialFile' EXCLUDE
        FOR FILESET('mmpolicyRules','todelete','tapenode-stuff','toarchive')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



And here is another to detect files not looked at for more than 6 months. I 
found more effective to use atime and ctime. You could combine this with the 
one above to detect file size as well.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
dss-mgt1:/scratch/r/root/mmpolicyRules # cat 
mmpolicyRules-list-atime-ctime-gt-6months
/* A macro to abbreviate VARCHAR */
define([vc],[VARCHAR($1)])

/* Define three external lists */
RULE EXTERNAL LIST 'accessedfiles' EXEC 
'/gpfs/fs0/scratch/r/root/mmpolicyRules/mmpolicyExec-list'

/* Generate a list of all files, directories, plus all other file system 
objects,
   like symlinks, named pipes, etc, accessed prior to a certain date AND that 
are
   not owned by root. Include the owner's id with each object and sort them by
   the owner's id */

/* Files in special filesets, such as mmpolicyRules, are never moved or deleted 
*/
RULE 'ExcSpecialFile' EXCLUDE
        FOR FILESET ('scratch-root','todelete','root')

RULE 'r5' LIST 'accessedfiles'
        DIRECTORIES_PLUS
        FROM POOL 'data'
        SHOW('-u' vc(USER_ID) || ' -a' || vc(ACCESS_TIME) || ' -c' || 
vc(CREATION_TIME) || ' -s ' || vc(FILE_SIZE))
        WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 183) AND 
(DAYS(CURRENT_TIMESTAMP) - DAYS(CREATION_TIME) > 183) AND NOT USER_ID = 0
                AND NOT (PATH_NAME LIKE '/gpfs/fs0/scratch/r/root/%')
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Note that both these scripts work on a system wide (or root fileset) basis, and 
will not give you specific directories, unless you run them several times on 
specific directories (not very efficient). To produce general lists per 
directory you would need to do some post processing on the lists, with 'awk' or 
some other scripting language. If you need some samples I can send you.


And finally, you need to be more specific by what you mean by 'archivable'. 
Once you produce the list you can do several things with them or leverage the 
rules to actually execute things, such as move, delete, or hsm stuff. The 
/usr/lpp/mmfs/samples/ilm path has some samples as well.



On 4/3/2020 18:25:33, Jim Kavitsky wrote:
Hello everyone,
I'm managing a low-multi-petabyte Scale filesystem with hundreds of millions of 
inodes, and I'm looking for the best way to locate archivable directories. For 
example, these might be directories where whose contents were greater than 5 or 
10TB, and whose contents had atimes greater than two years.

Has anyone found a great way to do this with a policy engine run? If not, is 
there another good way that anyone would recommend? Thanks in advance,

yes, there is another way, the 'mmfind' utility, also in the same sample path. You have 
to compile it for you OS (mmfind.README). This is a very powerful canned procedure that 
lets you run the "-exec" option just as in the normal linux version of 'find'. 
I use it very often, and it's just as efficient as the other policy rules based 
alternative.

Good luck.

Keep safe and confined.

Jaime



Jim Kavitsky

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


.
.
.        ************************************
          TELL US ABOUT YOUR SUCCESS STORIES
         http://www.scinethpc.ca/testimonials
         ************************************
---
Jaime Pinto - Storage Analyst
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to