Yea, open_fd_count is broken…
We have been working on the right way to fix it.
Frank
From: bharat singh [mailto:bharat064...@gmail.com]
Sent: Saturday, February 10, 2018 7:42 PM
To: Malahal Naineni <mala...@gmail.com>
Cc: nfs-ganesha-devel@lists.sourceforge.net
Subject: Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high open_fd_count
Hey,
I think there is a leak in open_fd_count.
fsal_rdwr() uses fsal_open() to open the file, but uses obj->obj_ops.close(obj)
to close the file and there is no decrement of open_fd_count.
So this counter keeps increasing and I could easily hit the 4k hard limit with
prolonged read/writes.
I changed it to use fsal_close() as it also does the decrement. After this
change the open_fd_count was looking OK.
But recently I saw open_fd_count being underflown to
open_fd_count=18446744073709551615
So i am suspecting a double close. Any suggestions ?
Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
fsal_io_direction_t io_direction,
uint64_t offset, size_t io_size,
size_t *bytes_moved, void *buffer,
bool *eof,
bool *sync, struct io_info *info)
{
...
loflags = obj->obj_ops.status(obj);
while ((!fsal_is_open(obj))
|| (loflags && loflags != FSAL_O_RDWR && loflags !=
openflags)) {
loflags = obj->obj_ops.status(obj);
if ((!fsal_is_open(obj))
|| (loflags && loflags != FSAL_O_RDWR
&& loflags != openflags)) {
fsal_status = fsal_open(obj, openflags);
if (FSAL_IS_ERROR(fsal_status))
goto out;
opened = true;
}
loflags = obj->obj_ops.status(obj);
}
..
if ((fsal_status.major != ERR_FSAL_NOT_OPENED)
&& (obj->obj_ops.status(obj) != FSAL_O_CLOSED)) {
LogFullDebug(COMPONENT_FSAL,
"fsal_rdwr_plus: CLOSING
file %p",
obj);
fsal_status = obj->obj_ops.close(obj);
>>>>>>>> using fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogCrit(COMPONENT_FSAL,
"Error closing file
in fsal_rdwr_plus: %s.",
fsal_err_txt(fsal_status));
}
}
...
if (opened) {
fsal_status = obj->obj_ops.close(obj); >>>>>>>> using
fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogEvent(COMPONENT_FSAL,
"fsal_rdwr_plus: close = %s",
fsal_err_txt(fsal_status));
goto out;
}
}
...
}
On Tue, Jan 2, 2018 at 12:30 AM, Malahal Naineni <mala...@gmail.com
<mailto:mala...@gmail.com> > wrote:
The links I gave you will have everything you need. You should be able to
download gerrit reviews by "git review -d <number>" or download from the gerrit
web gui.
"390496" is merged upstream, but the other one is not merged yet.
$ git log --oneline --grep='Fix closing global file descriptors' origin/next
5c2efa8f0 Fix closing global file descriptors
On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com
<mailto:bharat064...@gmail.com> > wrote:
Thanks Malahal
Can you point me to these issues/fixes. I will try to patch V2.5-stable and run
my tests.
Thanks,
Bharat
On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com
<mailto:mala...@gmail.com> > wrote:
>> I see that mdcache keeps growing beyond the high water mark and lru
>> reclamation can’t keep up.
mdcache is different from "FD" cache. I don't think we found an issue with
mdcache itself. We found couple of issues with "FD cache"
1) https://review.gerrithub.io/#/c/391266/
2) https://review.gerrithub.io/#/c/390496/
Neither of them are in V2.5-stable at this point. We will have to backport
these and others soon.
Regards, Malahal.
On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com
<mailto:bharat064...@gmail.com> > wrote:
Adding nfs-ganesha-support..
On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com
<mailto:bharat064...@gmail.com> > wrote:
Hello,
I am testing NFSv3 Ganesha implementation against nfstest_io tool. I see that
mdcache keeps growing beyond the high water mark and lru reclamation can’t keep
up.
[cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded. The LRU thread
is unable to make progress in reclaiming FDs. Disabling FD cache.
mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark, waking
LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
I am on Ganesha V2.5.2 with default config settings
So couple of questions:
1. Is Ganesha tested against these kind of tools, which does a bunch of
open/close in quick successions.
2. Is there a way to suppress these error messages and/or expedite the lru
reclamation process.
3. Any suggestions regarding the usage of these kind of tools with Ganesha.
Thanks,
Bharat
--
-Bharat
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
<mailto:Nfs-ganesha-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
--
-Bharat
--
-Bharat
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel