Hello Malhal,

Thank you. Actually I don’t see the parameter Cache_FDs in our ganesha config. 
But when I trace LRU processing I see that almost no FDs get released. And the 
number of FDs given in the log messages doesn’t match what I see in /proc/<pid 
of ganesha>/fd/. I see 512k open files while the logfile give 600k. Even 4hours 
since the I suspended the node and all i/o activity stopped I see 500k open 
files and LRU processing doesn’t close any of them.

This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I’ll open a case 
with IBM. We did see gansha to fail to open new files and hence client requests 
to fail. I assume that 500K FDs compared to 10K FDs as before create some 
notable overhead for ganesha, spectrum scale and kernel and withdraw resources 
from samba.

I’ll post to the list once we got some results.

Cheers,

Heiner


Start of LRU processing

2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 
entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG 
:formeropen=607025 totalwork=0 workpass=51350 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up 
to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually 
processed 50 entries on lane 1027 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 
entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG 
:formeropen=607025 totalwork=0 workpass=51400 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up 
to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually 
processed 50 entries on lane 1028 closing 0 descriptors

End of log
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up 
to 50 entries from lane 1029
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually 
processed 50 entries on lane 1029 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 
entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG 
:formeropen=607025 totalwork=0 workpass=51500 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up 
to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually 
processed 50 entries on lane 1030 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, 
open_fd_count:607024  count:29503718 fdrate:1908874353 threadwait=9
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : 
gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG 
:currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 
lanes=1031 fds_lowat=167772

From: <[email protected]> on behalf of Malahal R Naineni 
<[email protected]>
Reply to: gpfsug main discussion list <[email protected]>
Date: Thursday, 19 September 2019 at 20:39
To: "[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this 
unusual?

NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for 
read/write when there is an NFSv3 read/write request. It does cache file 
descriptors, so its open count can be very large. If you have 'Cache_FDs = 
true" in your config, ganesha aggressively caches file descriptors.

Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should 
give us better insight on what is happening when the the open file descriptors 
count is very high.

When the I/O failure happens or when the open fd count is high, you could do 
the following:

1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG
2. wait for 90 seconds, then run
3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT

Regards, Malahal.

----- Original message -----
From: "Billich Heinrich Rainer (ID SD)" <[email protected]>
Sent by: [email protected]
To: gpfsug main discussion list <[email protected]>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is 
this unusual?
Date: Thu, Sep 19, 2019 7:51 PM





Hello,



Is it usual to see 200’000-400’000 open files for a single ganesha process? Or 
does this indicate that something ist wrong?



We have some issues with ganesha (on spectrum scale protocol nodes)  reporting 
NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of 
open files, 200’000-400’000 open files per daemon (and 500 threads and about 
250 client connections). Other nodes have 1’000 – 10’000 open files by ganesha 
only and don’t show the issue.



If someone could explain how ganesha decides which files to keep open and which 
to close that would help, too. As NFSv3 is stateless the client doesn’t 
open/close a file, it’s the server to decide when to close it? We do have a few 
NFSv4 clients, too.



Are there certain access patterns that can trigger such a large number of open 
file? Maybe traversing and reading a large number of small files?



Thank you,

Heiner



I did count the open files  by counting the entries in /proc/<pid of 
ganesha>/fd/ . With several 100k entries I failed to do a ‘ls -ls’ to list all 
the symbolic links, hence I can’t relate the open files to different exports 
easily.



I did post this to the ganesha mailing list, too.

--

=======================

Heinrich Billich

ETH Zürich

Informatikdienste

Tel.: +41 44 632 72 56

[email protected]

========================




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to