Hello GPFS admins! I hope everybody had a great start to the new year so far.

Lately, I've had a few of my users get an error similar to:

      error creating file: no space left on device.


When trying to create even simple files (using Linux `touch` command). However, 
if they try again in a second or two, the file is created without a problem and 
they go on about doing their work. I can never tell when they are likely to get 
the error message about 'no space left on device'. The filesystem creates many 
files in parallel (depending on the usage of the system and movement of files 
from other sites)

However, let me first describe our environment a little better. We have a 3 
GPFS file systems (home, project, gscratch) on RHEL 6.3 InfiniBand HPC cluster. 
The version of GPFS is 3.5.0-11. We utilize fileset quotas (on block limits, 
not file limits) for each file system. Each user has a home fileset for storing 
basic configuration files, basic notes, and other small files. Each user 
belongs to a minimum of one project and the quota is shared between the users 
of the project. The gscratch file system is similar to that of the project file 
system except that files are deleted after ~9 days.

The partially good news (perhaps) is that the error mentioned above only occurs 
on the project file system. We've at least not observed the error on the home 
and gscratch file systems. Here's my initial investigation so far:


1.)Checked the fileset quota on one of the experienced filesets:

--
# mmlsquota -j ModMast project
                         Block Limits                                    |     
File Limits
Filesystem type             KB      quota      limit   in_doubt    grace |    
files   quota    limit in_doubt    grace  Remarks
project    FILESET   953382016          0 16106127360          0     none |  
8666828       0        0        0     none
--

It would seem from the information that the project is indeed well under their 
quota for their particular project.


2.)Then I checked the overall file system to see if the capacity/inode is 
nearly full:

--
# mmdf project
disk                disk size  failure holds    holds              free KB      
       free KB
name                    in KB    group metadata data        in full blocks      
  in fragments
--------------- ------------- -------- -------- ----- -------------------- 
-------------------
Disks in storage pool: system (Maximum disk size allowed is 397 TB)
U01_L0            15623913472       -1 Yes      Yes      7404335104 ( 47%)     
667820032 ( 4%)
U01_L1            15623913472       -1 Yes      Yes      7498215424 ( 48%)     
642773120 ( 4%)
U01_L2            15623913472       -1 Yes      Yes      7497969664 ( 48%)     
642664576 ( 4%)
U01_L3            15623913472       -1 Yes      Yes      7496232960 ( 48%)     
644327936 ( 4%)
U01_L4            15623913472       -1 Yes      Yes      7499296768 ( 48%)     
640117376 ( 4%)
U01_L5            15623913472       -1 Yes      Yes      7494881280 ( 48%)     
644168320 ( 4%)
U01_L6            15623913472       -1 Yes      Yes      7494164480 ( 48%)     
643673216 ( 4%)
U01_L7            15623913472       -1 Yes      Yes      7497433088 ( 48%)     
639918976 ( 4%)
U01_L8            15623913472       -1 Yes      Yes      7494139904 ( 48%)     
645130240 ( 4%)
U01_L9            15623913472       -1 Yes      Yes      7498375168 ( 48%)     
639979520 ( 4%)
U01_L10           15623913472       -1 Yes      Yes      7496028160 ( 48%)     
641909632 ( 4%)
U01_L11           15623913472       -1 Yes      Yes      7496093696 ( 48%)     
643749504 ( 4%)
U01_L12           15623913472       -1 Yes      Yes      7496425472 ( 48%)     
641556992 ( 4%)
U01_L13           15623913472       -1 Yes      Yes      7495516160 ( 48%)     
643395840 ( 4%)
U01_L14           15623913472       -1 Yes      Yes      7496908800 ( 48%)     
642418816 ( 4%)
U01_L15           15623913472       -1 Yes      Yes      7495823360 ( 48%)     
643580416 ( 4%)
U01_L16           15623913472       -1 Yes      Yes      7499939840 ( 48%)     
641538688 ( 4%)
U01_L17           15623913472       -1 Yes      Yes      7497355264 ( 48%)     
642184704 ( 4%)
U13_L0             2339553280       -1 Yes      No       2322395136 ( 99%)      
 8190848 ( 0%)
U13_L1             2339553280       -1 Yes      No       2322411520 ( 99%)      
 8189312 ( 0%)
U13_L12           15623921664       -1 Yes      Yes      7799422976 ( 50%)     
335150208 ( 2%)
U13_L13           15623921664       -1 Yes      Yes      8002662400 ( 51%)     
126059264 ( 1%)
U13_L14           15623921664       -1 Yes      Yes      8001093632 ( 51%)     
126107648 ( 1%)
U13_L15           15623921664       -1 Yes      Yes      8001732608 ( 51%)     
126167168 ( 1%)
U13_L16           15623921664       -1 Yes      Yes      8000077824 ( 51%)     
126240768 ( 1%)
U13_L17           15623921664       -1 Yes      Yes      8001458176 ( 51%)     
126068480 ( 1%)
U13_L18           15623921664       -1 Yes      Yes      7998636032 ( 51%)     
127111680 ( 1%)
U13_L19           15623921664       -1 Yes      Yes      8001892352 ( 51%)     
125148928 ( 1%)
U13_L20           15623921664       -1 Yes      Yes      8001916928 ( 51%)     
126187904 ( 1%)
U13_L21           15623921664       -1 Yes      Yes      8002568192 ( 51%)     
126591616 ( 1%)
                -------------                         -------------------- 
-------------------
(pool total)     442148765696                          219305402368 ( 50%)   
13078121728 ( 3%)

                =============                         ==================== 
===================
(data)           437469659136                          214660595712 ( 49%)   
13061741568 ( 3%)
(metadata)       442148765696                          219305402368 ( 50%)   
13078121728 ( 3%)
                =============                         ==================== 
===================
(total)          442148765696                          219305402368 ( 50%)   
13078121728 ( 3%)

Inode Information
-----------------
Number of used inodes:       133031523
Number of free inodes:         1186205
Number of allocated inodes:  134217728
Maximum number of inodes:    134217728
--

Eureka! From here it seems that the inode capacity is teetering on its limit. I 
think at this point it would be best to educate our users on not writing 
millions of small text files as I don't think it is possible to adjust the GPFS 
block size to something lower (block size is currently 4MB). The system was 
originally targeted at large read/writes from traditional HPC users, but we 
have now diversified our user base to include other computing areas outside 
traditional HPC. Documentation states that if parallel writes are to be done, 
that a minimum of 5% of the inodes need to be free otherwise performance will 
suffer. From above, we have less than 1% free which I think is the root of our 
problem.

Therefore, is there a method to safely increase the maximum inode count and 
could it be done during operation or should the system be unmounted? I've man 
paged / searched online and found a few hints suggesting below but was curious 
about its safety during operation:

      mmchfs project --inode-limit <new_max_inode_count>

The man page describes that the limit is:

      max_files = total_filesystem_space / (inode_size + subblock_size)

and the subblock size defined from IBM's website as 1/32 of the block size 
(which is 4MB). Therefore, I calculate that the maximum number of inodes I 
could potentially have is:

      3440846425

Which is approximately 25x the current maximum, so I think there is reason that 
I can increase the inode count without too much worry. Are there any caveats to 
my logic here? I'm not saying I'll increase it to the maximum value right away 
because the inode space would take away from some usable capacity of the system.

Thanks for any comments and recommendations. I have a grand size maintenance 
period coming up due to datacenter power upgrades and I'll be given ~2 weeks of 
down time for maintenance which I'm trying to get all my ducks in line and if I 
need to do something time consuming with the file systems, I'd like to know 
ahead of time so I can do it during the maintenance window as I will probably 
not get another one window for many months after.

Again, thank you all!

Jared Baker
ARCC



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to