See https://review.gerrithub.io/#/c/391267/ for GPFS fsal. You could do
something similar to VFS fsal if you are using VFS fsal.
Regards, Malahal.
On Thu, Feb 15, 2018 at 1:19 AM, bharat singh <bharat064...@gmail.com>
wrote:
> Yeah, that worked and I don't see this going below -1. So initializing it
> to a non-zero value have avoided this for now.
>
> But I still see the 4k fd limit being exhausted after 24hrs of IO. My
> setup currently shows open_fd_count=13k but there are only 30 files.
> # ls -al /proc/25832/fd | wc -l
> 559
>
> Also /proc won't give any clue. So I still believe there are more leaks to
> this counter than the one I saw in fsal_rdwr()
> Regarding the proper fix, when would it be available for us to try it out.
>
>
> On Mon, Feb 12, 2018 at 10:10 AM, Malahal Naineni <mala...@gmail.com>
> wrote:
>
>> Technically you should use atomic fetch to read it , at least on some
>> archs. Also your assertion might not be hit even if the atomic ops are
>> working right. In fact, they better be working correctly.
>>
>> As an example, say it is 1 and both threads check for assertion. Then
>> both threads decrement and the end value would be -1. If you want to catch
>> in an assert, then please use the return value of the atomic decrement
>> operation for the assertion.
>>
>>
>>
>> On Mon, Feb 12, 2018 at 9:55 PM bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>>> Yeah. Looks like lock-free updates to open_fd_count is creating the
>>> issue.
>>> There is no double close, as I couldn’t hit the assert(open_fd_count >
>>> 0) I have added before the decrements.
>>>
>>> And once it hits this state, it ping-pongs between 0 & ULLONG_MAX.
>>>
>>> So as a workaround I have intitalized open_fd_count = <num of worker
>>> thds> to avoid these racey decrements. I haven’t seen the warnings after
>>> this change over a couple of hours of testing.
>>>
>>>
>>>
>>> [work-162] fsal_open :FSAL :CRIT :before increment open_fd_count0
>>> [work-162] fsal_open :FSAL :CRIT :after increment open_fd_count1
>>> [work-128] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-128] fsal_close :FSAL :CRIT :after decrement open_fd_count0
>>> [work-153] fsal_open :FSAL :CRIT :before increment open_fd_count0
>>> [work-153] fsal_open :FSAL :CRIT :after increment open_fd_count1
>>> [work-153] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-162] fsal_close :FSAL :CRIT :before decrement open_fd_count1
>>> [work-153] fsal_close :FSAL :CRIT :after decrement open_fd_count0
>>> [work-162] fsal_close :FSAL :CRIT :after decrement
>>> open_fd_count18446744073709551615
>>> [work-148] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit
>>> Exceeded. Disabling FD Cache and waking LRU thread.
>>> open_fd_count=18446744073709551615, fds_hard_limit=4055
>>>
>>> [work-111] fsal_open :FSAL :CRIT :before increment
>>> open_fd_count18446744073709551615
>>> [work-111] fsal_open :FSAL :CRIT :after increment open_fd_count0
>>> [cache_lru] lru_run :INODE LRU :EVENT :Re-enabling FD cache.
>>> [work-111] fsal_close :FSAL :CRIT :before decrement open_fd_count0
>>> [work-111] fsal_close :FSAL :CRIT :after decrement
>>> open_fd_count18446744073709551615
>>>
>>> -bharat
>>>
>>> On Sun, Feb 11, 2018 at 10:32 PM, Frank Filz <ffilz...@mindspring.com>
>>> wrote:
>>>
>>>> Yea, open_fd_count is broken…
>>>>
>>>>
>>>>
>>>> We have been working on the right way to fix it.
>>>>
>>>>
>>>>
>>>> Frank
>>>>
>>>>
>>>>
>>>> *From:* bharat singh [mailto:bharat064...@gmail.com]
>>>> *Sent:* Saturday, February 10, 2018 7:42 PM
>>>> *To:* Malahal Naineni <mala...@gmail.com>
>>>> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
>>>> *Subject:* Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high
>>>> open_fd_count
>>>>
>>>>
>>>>
>>>> Hey,
>>>>
>>>>
>>>>
>>>> I think there is a leak in open_fd_count.
>>>>
>>>>
>>>>
>>>> fsal_rdwr() uses fsal_open() to open the file, but uses
>>>> obj->obj_ops.close(obj) to close the file and there is no decrement of
>>>> open_fd_count.
>>>>
>>>> So this counter keeps increasing and I could easily hit the 4k hard
>>>> limit with prolonged read/writes.
>>>>
>>>>
>>>>
>>>> I changed it to use fsal_close() as it also does the decrement. After
>>>> this change the open_fd_count was looking OK.
>>>>
>>>> But recently I saw open_fd_count being underflown to
>>>> open_fd_count=18446744073709551615
>>>>
>>>>
>>>>
>>>> So i am suspecting a double close. Any suggestions ?
>>>>
>>>>
>>>>
>>>> Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
>>>>
>>>> fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
>>>>
>>>> fsal_io_direction_t io_direction,
>>>>
>>>> uint64_t offset, size_t io_size,
>>>>
>>>> size_t *bytes_moved, void *buffer,
>>>>
>>>> bool *eof,
>>>>
>>>> bool *sync, struct io_info *info)
>>>>
>>>> {
>>>>
>>>> ...
>>>>
>>>> loflags = obj->obj_ops.status(obj);
>>>>
>>>> while ((!fsal_is_open(obj))
>>>>
>>>> || (loflags && loflags != FSAL_O_RDWR && loflags !=
>>>> openflags)) {
>>>>
>>>> loflags = obj->obj_ops.status(obj);
>>>>
>>>> if ((!fsal_is_open(obj))
>>>>
>>>> || (loflags && loflags != FSAL_O_RDWR
>>>>
>>>> && loflags != openflags)) {
>>>>
>>>> fsal_status = fsal_open(obj,
>>>> openflags);
>>>>
>>>> if (FSAL_IS_ERROR(fsal_status))
>>>>
>>>> goto out;
>>>>
>>>> opened = true;
>>>>
>>>> }
>>>>
>>>> loflags = obj->obj_ops.status(obj);
>>>>
>>>> }
>>>>
>>>> ..
>>>>
>>>> if ((fsal_status.major != ERR_FSAL_NOT_OPENED)
>>>>
>>>> && (obj->obj_ops.status(obj) !=
>>>> FSAL_O_CLOSED)) {
>>>>
>>>> LogFullDebug(COMPONENT_FSAL,
>>>>
>>>> "fsal_rdwr_plus:
>>>> CLOSING file %p",
>>>>
>>>> obj);
>>>>
>>>>
>>>>
>>>> fsal_status =
>>>> obj->obj_ops.close(obj); >>>>>>>> using fsal_close here ?
>>>>
>>>> if (FSAL_IS_ERROR(fsal_status)) {
>>>>
>>>> LogCrit(COMPONENT_FSAL,
>>>>
>>>> "Error
>>>> closing file in fsal_rdwr_plus: %s.",
>>>>
>>>>
>>>> fsal_err_txt(fsal_status));
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> ...
>>>>
>>>> if (opened) {
>>>>
>>>> fsal_status = obj->obj_ops.close(obj);
>>>> >>>>>>>> using fsal_close here ?
>>>>
>>>> if (FSAL_IS_ERROR(fsal_status)) {
>>>>
>>>> LogEvent(COMPONENT_FSAL,
>>>>
>>>> "fsal_rdwr_plus: close =
>>>> %s",
>>>>
>>>>
>>>> fsal_err_txt(fsal_status));
>>>>
>>>> goto out;
>>>>
>>>> }
>>>>
>>>> }
>>>>
>>>> ...
>>>>
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 2, 2018 at 12:30 AM, Malahal Naineni <mala...@gmail.com>
>>>> wrote:
>>>>
>>>> The links I gave you will have everything you need. You should be able
>>>> to download gerrit reviews by "git review -d <number>" or download from the
>>>> gerrit web gui.
>>>>
>>>>
>>>>
>>>> "390496" is merged upstream, but the other one is not merged yet.
>>>>
>>>>
>>>>
>>>> $ git log --oneline --grep='Fix closing global file descriptors'
>>>> origin/next
>>>>
>>>> 5c2efa8f0 Fix closing global file descriptors
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com>
>>>> wrote:
>>>>
>>>> Thanks Malahal
>>>>
>>>>
>>>>
>>>> Can you point me to these issues/fixes. I will try to patch V2.5-stable
>>>> and run my tests.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bharat
>>>>
>>>>
>>>>
>>>> On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com>
>>>> wrote:
>>>>
>>>> >> I see that mdcache keeps growing beyond the high water mark and lru
>>>> reclamation can’t keep up.
>>>>
>>>>
>>>>
>>>> mdcache is different from "FD" cache. I don't think we found an issue
>>>> with mdcache itself. We found couple of issues with "FD cache"
>>>>
>>>>
>>>>
>>>> 1) https://review.gerrithub.io/#/c/391266/
>>>>
>>>> 2) https://review.gerrithub.io/#/c/390496/
>>>>
>>>>
>>>>
>>>> Neither of them are in V2.5-stable at this point. We will have to
>>>> backport these and others soon.
>>>>
>>>>
>>>>
>>>> Regards, Malahal.
>>>>
>>>>
>>>>
>>>> On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com>
>>>> wrote:
>>>>
>>>> Adding nfs-ganesha-support..
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>>
>>>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I
>>>> see that mdcache keeps growing beyond the high water mark and lru
>>>> reclamation can’t keep up.
>>>>
>>>>
>>>>
>>>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded. The LRU
>>>> thread is unable to make progress in reclaiming FDs. Disabling FD cache.
>>>>
>>>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>>>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>>>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>>>
>>>>
>>>>
>>>> I am on Ganesha V2.5.2 with default config settings
>>>>
>>>>
>>>>
>>>> So couple of questions:
>>>>
>>>> 1. Is Ganesha tested against these kind of tools, which does a bunch of
>>>> open/close in quick successions.
>>>>
>>>> 2. Is there a way to suppress these error messages and/or expedite the
>>>> lru reclamation process.
>>>>
>>>> 3. Any suggestions regarding the usage of these kind of tools with
>>>> Ganesha.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Bharat
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -Bharat
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------
>>>> Check out the vibrant tech community on one of the world's most
>>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>>> _______________________________________________
>>>> Nfs-ganesha-devel mailing list
>>>> Nfs-ganesha-devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -Bharat
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> -Bharat
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon>
>>>> Virus-free.
>>>> www.avast.com
>>>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>
>>>> <#m_103769442000253716_m_-3883727770123829360_m_311239901911670444_m_3830110772624612959_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>>>
>>>
>>>
>>>
>>> --
>>> -Bharat
>>>
>>>
>>>
>
>
> --
> -Bharat
>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel