Hey,
I think there is a leak in open_fd_count.
fsal_rdwr() uses fsal_open() to open the file, but uses
obj->obj_ops.close(obj) to close the file and there is no decrement of
open_fd_count.
So this counter keeps increasing and I could easily hit the 4k hard limit
with prolonged read/writes.
I changed it to use fsal_close() as it also does the decrement. After this
change the open_fd_count was looking OK.
But recently I saw open_fd_count being underflown to
open_fd_count=18446744073709551615
So i am suspecting a double close. Any suggestions ?
Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
fsal_io_direction_t io_direction,
uint64_t offset, size_t io_size,
size_t *bytes_moved, void *buffer,
bool *eof,
bool *sync, struct io_info *info)
{
...
loflags = obj->obj_ops.status(obj);
while ((!fsal_is_open(obj))
|| (loflags && loflags != FSAL_O_RDWR && loflags != openflags)) {
loflags = obj->obj_ops.status(obj);
if ((!fsal_is_open(obj))
|| (loflags && loflags != FSAL_O_RDWR
&& loflags != openflags)) {
fsal_status = fsal_open(obj, openflags);
if (FSAL_IS_ERROR(fsal_status))
goto out;
opened = true;
}
loflags = obj->obj_ops.status(obj);
}
..
if ((fsal_status.major != ERR_FSAL_NOT_OPENED)
&& (obj->obj_ops.status(obj) != FSAL_O_CLOSED)) {
LogFullDebug(COMPONENT_FSAL,
"fsal_rdwr_plus: CLOSING file %p",
obj);
fsal_status = obj->obj_ops.close(obj); >>>>>>>> using fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogCrit(COMPONENT_FSAL,
"Error closing file in fsal_rdwr_plus: %s.",
fsal_err_txt(fsal_status));
}
}
...
if (opened) {
fsal_status = obj->obj_ops.close(obj); >>>>>>>> using fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogEvent(COMPONENT_FSAL,
"fsal_rdwr_plus: close = %s",
fsal_err_txt(fsal_status));
goto out;
}
}
...
}
On Tue, Jan 2, 2018 at 12:30 AM, Malahal Naineni <mala...@gmail.com> wrote:
> The links I gave you will have everything you need. You should be able to
> download gerrit reviews by "git review -d <number>" or download from the
> gerrit web gui.
>
> "390496" is merged upstream, but the other one is not merged yet.
>
> $ git log --oneline --grep='Fix closing global file descriptors'
> origin/next
> 5c2efa8f0 Fix closing global file descriptors
>
>
>
>
>
> On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com>
> wrote:
>
>> Thanks Malahal
>>
>> Can you point me to these issues/fixes. I will try to patch V2.5-stable
>> and run my tests.
>>
>> Thanks,
>> Bharat
>>
>> On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com>
>> wrote:
>>
>>> >> I see that mdcache keeps growing beyond the high water mark and lru
>>> reclamation can’t keep up.
>>>
>>> mdcache is different from "FD" cache. I don't think we found an issue
>>> with mdcache itself. We found couple of issues with "FD cache"
>>>
>>> 1) https://review.gerrithub.io/#/c/391266/
>>> 2) https://review.gerrithub.io/#/c/390496/
>>>
>>> Neither of them are in V2.5-stable at this point. We will have to
>>> backport these and others soon.
>>>
>>> Regards, Malahal.
>>>
>>> On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com>
>>> wrote:
>>>
>>>> Adding nfs-ganesha-support..
>>>>
>>>>
>>>> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I
>>>>> see that mdcache keeps growing beyond the high water mark and lru
>>>>> reclamation can’t keep up.
>>>>>
>>>>>
>>>>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded. The
>>>>> LRU thread is unable to make progress in reclaiming FDs. Disabling FD
>>>>> cache.
>>>>>
>>>>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>>>>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>>>>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>>>>
>>>>>
>>>>> I am on Ganesha V2.5.2 with default config settings
>>>>>
>>>>>
>>>>> So couple of questions:
>>>>>
>>>>> 1. Is Ganesha tested against these kind of tools, which does a bunch
>>>>> of open/close in quick successions.
>>>>>
>>>>> 2. Is there a way to suppress these error messages and/or expedite the
>>>>> lru reclamation process.
>>>>>
>>>>> 3. Any suggestions regarding the usage of these kind of tools with
>>>>> Ganesha.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Bharat
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Bharat
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Nfs-ganesha-devel mailing list
>>>> Nfs-ganesha-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>>
>>>>
>>>
>>
>>
>> --
>> -Bharat
>>
>>
>>
>
--
-Bharat
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel