You should check the inode counts for each of the filesets using the 
mmlsfileset command.  You should check the local disk space on all the nodes.

I presume you are aware that Scale 4.2.3 has been out of support for 4 years.

Fred

Fred Stock, Spectrum Scale Development Advocacy
[email protected]<mailto:[email protected]> | 720-430-8821



From: gpfsug-discuss <[email protected]> on behalf of Rob 
Kudyba <[email protected]>
Date: Thursday, June 6, 2024 at 5:39 PM
To: gpfsug main discussion list <[email protected]>
Subject: [EXTERNAL] Re: [gpfsug-discuss] No space left on device, but plenty of 
quota space for inodes and blocks
Are you seeing the issues across the whole file system or in certain areas? 
Only with accounts in GPFS, local accounts and root do not gt this. That sounds 
like inode exhaustion to me (and based on it not being block exhaustion as 
you’ve demonstrated). 
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHiymkdb2PX5Ys9u3xcIH6Vd3Ap1CobKrLSv4AEKLaxWTmX-SIXo5pwXtsG8GuxP6yYyms8BE2p0j0YYMsauSua4xvEzG7v8C4nNZ8q-8rr50pPoh5DWHA$>
Report Suspicious 
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHiymkdb2PX5Ys9u3xcIH6Vd3Ap1CobKrLSv4AEKLaxWTmX-SIXo5pwXtsG8GuxP6yYyms8BE2p0j0YYMsauSua4xvEzG7v8C4nNZ8q-8rr50pPoh5DWHA$>


ZjQcmQRYFpfptBannerEnd
Are you seeing the issues across the whole file system or in certain areas?

Only with accounts in GPFS, local accounts and root do not gt this.

That sounds like inode exhaustion to me (and based on it not being block 
exhaustion as you’ve demonstrated).

What does a “df -i /cluster” show you?

We bumped it up a few weeks ago:
df -i /cluster
Filesystem        Inodes     IUsed     IFree IUse% Mounted on
cluster           276971520 154807697 122163823   56% /cluster


Or if this is only in a certain area you can “cd” into that directory and run a 
“df -i .”

As root on a login node;
df -i
Filesystem        Inodes     IUsed     IFree IUse% Mounted on
/dev/sda2       20971520    169536  20801984    1% /
devtmpfs        12169978       528  12169450    1% /dev
tmpfs           12174353      1832  12172521    1% /run
tmpfs           12174353        77  12174276    1% /dev/shm
tmpfs           12174353        15  12174338    1% /sys/fs/cgroup
/dev/sda1              0         0         0     - /boot/efi
/dev/sda3       52428800      2887  52425913    1% /var
/dev/sda7      277368832     35913 277332919    1% /local
/dev/sda5      104857600       398 104857202    1% /tmp
tmpfs           12174353         1  12174352    1% /run/user/551336
tmpfs           12174353         1  12174352    1% /run/user/0
moto           276971520 154807697 122163823   56% /cluster
tmpfs           12174353         3  12174350    1% /run/user/441245
tmpfs           12174353        12  12174341    1% /run/user/553562
tmpfs           12174353         1  12174352    1% /run/user/525583
tmpfs           12174353         1  12174352    1% /run/user/476374
tmpfs           12174353         1  12174352    1% /run/user/468934
tmpfs           12174353         5  12174348    1% /run/user/551200
tmpfs           12174353         1  12174352    1% /run/user/539143
tmpfs           12174353         1  12174352    1% /run/user/488676
tmpfs           12174353         1  12174352    1% /run/user/493713
tmpfs           12174353         1  12174352    1% /run/user/507831
tmpfs           12174353         1  12174352    1% /run/user/549822
tmpfs           12174353         1  12174352    1% /run/user/500569
tmpfs           12174353         1  12174352    1% /run/user/443748
tmpfs           12174353         1  12174352    1% /run/user/543676
tmpfs           12174353         1  12174352    1% /run/user/451446
tmpfs           12174353         1  12174352    1% /run/user/497945
tmpfs           12174353         6  12174347    1% /run/user/554672
tmpfs           12174353        32  12174321    1% /run/user/554653
tmpfs           12174353         1  12174352    1% /run/user/30094
tmpfs           12174353         1  12174352    1% /run/user/470790
tmpfs           12174353        59  12174294    1% /run/user/553037
tmpfs           12174353         1  12174352    1% /run/user/554670
tmpfs           12174353         1  12174352    1% /run/user/548236
tmpfs           12174353         1  12174352    1% /run/user/547288
tmpfs           12174353         1  12174352    1% /run/user/547289

You may need to allocate more inodes to an independent inode fileset somewhere. 
 Especially with something as old as 4.2.3 you won’t have auto-inode expansion 
for the filesets.

Do we have to restart any service after upping the inode count?


Best,

J.D. Maloney
Lead HPC Storage Engineer | Storage Enabling Technologies Group
National Center for Supercomputing Applications (NCSA)

Ho JD I took an intermediate LCI workshop with you at Univ of Cincinnati!


From: gpfsug-discuss 
<[email protected]<mailto:[email protected]>> 
on behalf of Rob Kudyba <[email protected]<mailto:[email protected]>>
Date: Thursday, June 6, 2024 at 3:50 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: [gpfsug-discuss] No space left on device, but plenty of quota space 
for inodes and blocks
Running GPFS 4.2.3 on a DDN GridScaler and users are getting the No space left 
on device message when trying to write to a file. In /var/adm/ras/mmfs.log the 
only recent errors are this:

2024-06-06_15:51:22.311-0400: mmcommon getContactNodes cluster failed. Return 
code -1.
2024-06-06_15:51:22.311-0400: The previous error was detected on node x.x.x.x 
(headnode).
2024-06-06_15:53:25.088-0400: mmcommon getContactNodes cluster failed. Return 
code -1.
2024-06-06_15:53:25.088-0400: The previous error was detected on node x.x.x.x 
(headnode).

according to 
https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615<https://www.ibm.com/docs/en/storage-scale/5.1.9?topic=messages-6027-615>

Check the preceding messages, and consult the earlier chapters of this 
document. A frequent cause for such errors is lack of space in /var.

We have plenty of space left.

 /usr/lpp/mmfs/bin/mmlsdisk cluster
disk         driver   sector     failure holds    holds                         
   storage
name         type       size       group metadata data  status        
availability pool
------------ -------- ------ ----------- -------- ----- ------------- 
------------ ------------
S01_MDT200_1 nsd        4096         200 Yes      No    ready         up        
   system
S01_MDT201_1 nsd        4096         201 Yes      No    ready         up        
   system
S01_DAT0001_1 nsd        4096         100 No       Yes   ready         up       
    data1
S01_DAT0002_1 nsd        4096         101 No       Yes   ready         up       
    data1
S01_DAT0003_1 nsd        4096         100 No       Yes   ready         up       
    data1
S01_DAT0004_1 nsd        4096         101 No       Yes   ready         up       
    data1
S01_DAT0005_1 nsd        4096         100 No       Yes   ready         up       
    data1
S01_DAT0006_1 nsd        4096         101 No       Yes   ready         up       
    data1
S01_DAT0007_1 nsd        4096         100 No       Yes   ready         up       
    data1

 /usr/lpp/mmfs/bin/mmdf headnode
disk                disk size  failure holds    holds              free KB      
       free KB
name                    in KB    group metadata data        in full blocks      
  in fragments
--------------- ------------- -------- -------- ----- -------------------- 
-------------------
Disks in storage pool: system (Maximum disk size allowed is 14 TB)
S01_MDT200_1       1862270976      200 Yes      No        969134848 ( 52%)      
 2948720 ( 0%)
S01_MDT201_1       1862270976      201 Yes      No        969126144 ( 52%)      
 2957424 ( 0%)
                -------------                         -------------------- 
-------------------
(pool total)       3724541952                            1938260992 ( 52%)      
 5906144 ( 0%)

Disks in storage pool: data1 (Maximum disk size allowed is 578 TB)
S01_DAT0007_1     77510737920      100 No       Yes     21080752128 ( 27%)     
897723392 ( 1%)
S01_DAT0005_1     77510737920      100 No       Yes     14507212800 ( 19%)     
949412160 ( 1%)
S01_DAT0001_1     77510737920      100 No       Yes     14503620608 ( 19%)     
951327680 ( 1%)
S01_DAT0003_1     77510737920      100 No       Yes     14509205504 ( 19%)     
949340544 ( 1%)
S01_DAT0002_1     77510737920      101 No       Yes     14504585216 ( 19%)     
948377536 ( 1%)
S01_DAT0004_1     77510737920      101 No       Yes     14503647232 ( 19%)     
952892480 ( 1%)
S01_DAT0006_1     77510737920      101 No       Yes     14504486912 ( 19%)     
949072512 ( 1%)
                -------------                         -------------------- 
-------------------
(pool total)     542575165440                          108113510400 ( 20%)    
6598146304 ( 1%)

                =============                         ==================== 
===================
(data)           542575165440                          108113510400 ( 20%)    
6598146304 ( 1%)
(metadata)         3724541952                            1938260992 ( 52%)      
 5906144 ( 0%)
                =============                         ==================== 
===================
(total)          546299707392                          110051771392 ( 22%)    
6604052448 ( 1%)

Inode Information
-----------------
Total number of used inodes in all Inode spaces:          154807668
Total number of free inodes in all Inode spaces:           12964492
Total number of allocated inodes in all Inode spaces:     167772160
Total of Maximum number of inodes in all Inode spaces:    276971520

On the head node:

df -h
Filesystem                Size  Used Avail Use% Mounted on
/dev/sda4                 430G  216G  215G  51% /
devtmpfs                   47G     0   47G   0% /dev
tmpfs                      47G     0   47G   0% /dev/shm
tmpfs                      47G  4.1G   43G   9% /run
tmpfs                      47G     0   47G   0% /sys/fs/cgroup
/dev/sda1                 504M  114M  365M  24% /boot
/dev/sda2                 100M  9.9M   90M  10% /boot/efi
x.x.x.:/nfs-share  430G  326G  105G  76% /nfs-share
cluster                      506T  405T  101T  81% /cluster
tmpfs                     9.3G     0  9.3G   0% /run/user/443748
tmpfs                     9.3G     0  9.3G   0% /run/user/547288
tmpfs                     9.3G     0  9.3G   0% /run/user/551336
tmpfs                     9.3G     0  9.3G   0% /run/user/547289

The login nodes have plenty of space in /var:
/dev/sda3        50G  8.7G   42G  18% /var

What else should we check? We are just at 81% on the GPFS mounted file system 
but that should be enough for more space without these errors. Any recommended 
service(s) that we can restart?

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org<http://gpfsug.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org<http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to