Hi All,

And I’m not really down, but it is a rainy Monday morning here and GPFS did 
give me a scare in the last hour, so I thought that was a funny subject line.

So I have a >1 PB filesystem with 3 pools:  1) the system pool, which contains 
metadata only,  2) the data pool, which is where all I/O goes to by default, 
and 3) the capacity pool, which is where old crap gets migrated to.

I logged on this morning to see an alert that my data pool was 100% full.  I 
ran an mmdf from the cluster manager and, sure enough:

(pool total)           509.3T                                     0 (  0%)      
       0 ( 0%)

I immediately tried copying a file to there and it worked, so I figured GPFS 
must be failing writes over to the capacity pool, but an mmlsattr on the file I 
copied showed it being in the data pool.  Hmmm.

I also noticed that “df -h” said that the filesystem had 399 TB free, while 
mmdf said it only had 238 TB free.  Hmmm.

So after some fruitless poking around I decided that whatever was going to 
happen, I should kill the mmrestripefs I had running on the capacity pool … let 
me emphasize that … I had a restripe running on the capacity pool only (via the 
“-P” option to mmrestripefs) but it was the data pool that said it was 100% 
full.

I’m sure many of you have already figured out where this is going … after 
killing the restripe I ran mmdf again and:

(pool total)           509.3T                                  159T ( 31%)      
  1.483T ( 0%)

I have never seen anything like this before … any ideas, anyone?  PMR time?

Thanks!

Kevin
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to