Hi All,

I think we *may* be able to wrap this saga up…  ;-)

Dave - in regards to your question, all I know is that the tail end of the log 
file is “normal” for all the successful pool migrations I’ve done in the past 
few years.

It looks like the hard links were the problem.  We have one group with a 
fileset on our filesystem that they use for backing up Linux boxes in their 
lab.  That one fileset has thousands and thousands (I haven’t counted, but 
based on the output of that Perl script I wrote it could well be millions) of 
files with anywhere from 50 to 128 hard links each … those files ranged from a 
few KB to a few MB in size.

From what Marc said, my understanding is that with the way I had my policy rule 
written mmapplypolicy was seeing each of those as separate files and therefore 
thinking it was moving 50 to 128 times as much space to the gpfs23capacity pool 
as it really was for those files.  Marc can correct me or clarify further if 
necessary.  He directed me to add:

SIZE(KB_ALLOCATED/NLINK)

to both of my migrate rules in my policy file.  I did so and kicked off another 
mmapplypolicy last night, which is still running.  However, the prediction 
section now says:

[I] GPFS Policy Decisions and File Choice Totals:
 Chose to migrate 40050141920KB: 2051495 of 2051495 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name                   KB_Occupied        KB_Total  Percent_Occupied
gpfs23capacity             104098980256    124983549952     83.290145220%
gpfs23data                 168478368352    343753326592     49.011414674%
system                                0               0      0.000000000% (no 
user data)

So now it’s going to move every file it can that matches my policies because 
it’s figured out that a lot of those are hard links … and I don’t have enough 
files matching the criteria to fill the gpfs23capacity pool to the 98% limit 
like mmapplypolicy thought I did before.  According to the log file, it’s 
happily chugging along migrating files, and mmdf agrees that my gpfs23capacity 
pool is gradually getting more full (I have it QOSed, of course):

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd               58.2T       35 No       Yes          25.33T ( 44%)      
  68.13G ( 0%)
eon35Dnsd               58.2T       35 No       Yes          25.33T ( 44%)      
  68.49G ( 0%)
                -------------                         -------------------- 
-------------------
(pool total)           116.4T                                50.66T ( 44%)      
  136.6G ( 0%)

My sincere thanks to all who took the time to respond to my questions.  Of 
course, that goes double for Marc.

We (Vanderbilt) seem to have a long tradition of finding some edge cases in 
GPFS going all the way back to when we originally moved off of an NFS server to 
GPFS (2.2, 2.3?) back in 2005.  I was creating individual tarballs of each 
users’ home directory on the NFS server, copying the tarball to one of the NSD 
servers, and untarring it there (don’t remember why we weren’t rsync’ing, but 
there was a reason).  Everything was working just fine except for one user.  
Every time I tried to untar her home directory on GPFS it barfed part of the 
way thru … turns out that until then IBM hadn’t considered that someone would 
want to put 6 million files in one directory.  Gotta love those users!  ;-)

Kevin

On Apr 18, 2017, at 10:31 AM, David D. Johnson 
<[email protected]<mailto:[email protected]>> wrote:

I have an observation, which may merely serve to show my ignorance:
 Is it significant that the words "EXTERNAL EXEC/script” are seen below?
If migrating between storage pools within the cluster, I would expect the PIT 
engine to do the migration.
When doing HSM (off cluster, tape libraries, etc) is where I would expect to 
need a script to actually do the work.

[I] 2017-04-18@09:06:51.124 Policy execution. 1620263 files dispatched.
[I] A total of 1620263 files have been migrated, deleted or processed by an 
EXTERNAL EXEC/script;
        0 'skipped' files and/or errors.

— ddj
Dave Johnson
Brown University

On Apr 18, 2017, at 11:11 AM, Marc A Kaplan 
<[email protected]<mailto:[email protected]>> wrote:

ANYONE else reading this saga?  Who uses mmapplypolicy to migrate files within 
multi-TB file systems?  Problems? Or all working as expected?

------

Well, again mmapplypolicy "thinks" it has "chosen" 1.6 million files whose 
total size is 61 Terabytes and migrating those will bring the occupancy of 
gpfs23capacity pool to 98% and then we're done.

So now I'm wondering where this is going wrong.  Is there some bug in the 
reckoning inside of mmapplypolicy or somewhere else in GPFS?

Sure you can put in an PMR, and probably should.  I'm guessing whoever picks up 
the PMR will end up calling or emailing me ... but maybe she can do some of the 
clerical work for us...

While we're waiting for that... Here's what I suggest next.

Add  a clause ...

SHOW(varchar(KB_ALLOCATED) || ' n=' || varchar(NLINK))

before the WHERE clause to each of your rules.

Re-run the command with options  '-I test -L 2'  and collect the output.

We're not actually going to move any data, but we're going to look at the files 
and file sizes that are "chosen"...

You should see 1.6 million lines that look kind of like this:

/yy/dat/bigC     RULE 'msx' MIGRATE FROM POOL 'system' TO POOL 'xtra' 
WEIGHT(inf) SHOW( 1024 n=1)

Run a script over the output to add up all the SHOW() values in the lines that 
contain TO POOL 'gpfs23capacity' and verify that they do indeed
add up to 61TB...  (The show is in KB so the SHOW numbers should add up to 61 
billion).

That sanity checks the policy arithmetic.  Let's assume that's okay.

Then the next question is whether the individual numbers are correct... Zach 
Giles made a suggestion... which I'll interpret as
find some of the biggest of those files and check that they really are that 
big....

At this point, I really don't know, but I'm guessing there's some discrepances 
in the reported KB_ALLOCATED numbers for many of the files...
and/or they are "illplaced"  - the data blocks aren't all in the pool FROM POOL 
...

HMMMM....  I just thought about this some more and added the NLINK statistic.  
It would be unusual for this to be a big problem, but files that are hard 
linked are
not recognized by mmapplypolicy as sharing storage...
This has not come to my attention as a significant problem -- does the file 
system in question have significant GBs of hard linked files?

The truth is that you're the first customer/user/admin in a long time to 
question/examine how mmapplypolicy does its space reckoning ...
Optimistically that means it works fine for most customers...

So sorry, something unusual about your installation or usage...




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to