Hey,

I think there is a leak in open_fd_count.

fsal_rdwr() uses fsal_open() to open the file, but uses
obj->obj_ops.close(obj) to close the file and there is no decrement of
open_fd_count.
So this counter keeps increasing and I could easily hit the 4k hard limit
with prolonged read/writes.

I changed it to use fsal_close() as it also does the decrement. After this
change the open_fd_count was looking OK.
But recently I saw open_fd_count being underflown to
open_fd_count=18446744073709551615

So i am suspecting a double close. Any suggestions ?

 Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
      fsal_io_direction_t io_direction,
      uint64_t offset, size_t io_size,
      size_t *bytes_moved, void *buffer,
      bool *eof,
      bool *sync, struct io_info *info)
{
...
loflags = obj->obj_ops.status(obj);
while ((!fsal_is_open(obj))
       || (loflags && loflags != FSAL_O_RDWR && loflags != openflags)) {
loflags = obj->obj_ops.status(obj);
if ((!fsal_is_open(obj))
    || (loflags && loflags != FSAL_O_RDWR
&& loflags != openflags)) {
fsal_status = fsal_open(obj, openflags);
if (FSAL_IS_ERROR(fsal_status))
goto out;
opened = true;
}
loflags = obj->obj_ops.status(obj);
}
..
if ((fsal_status.major != ERR_FSAL_NOT_OPENED)
    && (obj->obj_ops.status(obj) != FSAL_O_CLOSED)) {
LogFullDebug(COMPONENT_FSAL,
     "fsal_rdwr_plus: CLOSING file %p",
     obj);

fsal_status = obj->obj_ops.close(obj);   >>>>>>>> using fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogCrit(COMPONENT_FSAL,
"Error closing file in fsal_rdwr_plus: %s.",
fsal_err_txt(fsal_status));
}
}
...
if (opened) {
fsal_status = obj->obj_ops.close(obj);    >>>>>>>> using fsal_close here ?
if (FSAL_IS_ERROR(fsal_status)) {
LogEvent(COMPONENT_FSAL,
"fsal_rdwr_plus: close = %s",
fsal_err_txt(fsal_status));
goto out;
}
}
...
}


On Tue, Jan 2, 2018 at 12:30 AM, Malahal Naineni <mala...@gmail.com> wrote:

> The links I gave you will have everything you need. You should be able to
> download gerrit reviews by "git review -d <number>" or download from the
> gerrit web gui.
>
> "390496" is merged upstream, but the other one is not merged yet.
>
> $ git log --oneline --grep='Fix closing global file descriptors'
> origin/next
> 5c2efa8f0 Fix closing global file descriptors
>
>
>
>
>
> On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com>
> wrote:
>
>> Thanks Malahal
>>
>> Can you point me to these issues/fixes. I will try to patch V2.5-stable
>> and run my tests.
>>
>> Thanks,
>> Bharat
>>
>> On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com>
>> wrote:
>>
>>> >> I see that mdcache keeps growing beyond the high water mark and lru
>>> reclamation can’t keep up.
>>>
>>> mdcache is different from "FD" cache. I don't think we found an issue
>>> with mdcache itself. We found couple of issues with "FD cache"
>>>
>>> 1) https://review.gerrithub.io/#/c/391266/
>>> 2) https://review.gerrithub.io/#/c/390496/
>>>
>>> Neither of them are in V2.5-stable at this point. We will have to
>>> backport these and others soon.
>>>
>>> Regards, Malahal.
>>>
>>> On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com>
>>> wrote:
>>>
>>>> Adding nfs-ganesha-support..
>>>>
>>>>
>>>> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I
>>>>> see that mdcache keeps growing beyond the high water mark and lru
>>>>> reclamation can’t keep up.
>>>>>
>>>>>
>>>>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded.  The
>>>>> LRU thread is unable to make progress in reclaiming FDs.  Disabling FD
>>>>> cache.
>>>>>
>>>>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>>>>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>>>>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>>>>
>>>>>
>>>>> I am on Ganesha V2.5.2 with default config settings
>>>>>
>>>>>
>>>>> So couple of questions:
>>>>>
>>>>> 1. Is Ganesha tested against these kind of tools, which does a bunch
>>>>> of open/close in quick successions.
>>>>>
>>>>> 2. Is there a way to suppress these error messages and/or expedite the
>>>>> lru reclamation process.
>>>>>
>>>>> 3. Any suggestions regarding the usage of these kind of tools with
>>>>> Ganesha.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Bharat
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Bharat
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Nfs-ganesha-devel mailing list
>>>> Nfs-ganesha-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>>
>>>>
>>>
>>
>>
>> --
>> -Bharat
>>
>>
>>
>


-- 
-Bharat
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to