Technically you should use atomic fetch to read it , at least on some
archs. Also your assertion might not be hit even if the atomic ops are
working right. In fact, they better be working correctly.

As an example, say it is 1 and both threads check for assertion. Then both
threads decrement and the end value would be -1.  If you want to catch in
an assert, then please use the return value of the atomic decrement
operation for the assertion.



On Mon, Feb 12, 2018 at 9:55 PM bharat singh <bharat064...@gmail.com> wrote:

> Yeah. Looks like lock-free updates to open_fd_count is creating the
> issue.
> There is no double close, as I couldn’t hit the assert(open_fd_count > 0)
> I have added before the decrements.
>
> And once it hits this state, it ping-pongs between 0 & ULLONG_MAX.
>
> So as a workaround I have intitalized open_fd_count = <num of worker thds>
> to avoid these racey decrements. I haven’t seen the warnings after this
> change over a couple of hours of testing.
>
>
>
> [work-162] fsal_open :FSAL :CRIT :before increment open_fd_count0
> [work-162] fsal_open :FSAL :CRIT :after increment open_fd_count1
> [work-128] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-128] fsal_close :FSAL :CRIT :after decrement open_fd_count0
> [work-153] fsal_open :FSAL :CRIT :before increment open_fd_count0
> [work-153] fsal_open :FSAL :CRIT :after increment open_fd_count1
> [work-153] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-162] fsal_close :FSAL :CRIT :before decrement open_fd_count1
> [work-153] fsal_close :FSAL :CRIT :after decrement open_fd_count0
> [work-162] fsal_close :FSAL :CRIT :after decrement
> open_fd_count18446744073709551615
> [work-148] mdcache_lru_fds_available :INODE LRU :CRIT :FD Hard Limit
> Exceeded.  Disabling FD Cache and waking LRU thread.
> open_fd_count=18446744073709551615, fds_hard_limit=4055
>
> [work-111] fsal_open :FSAL :CRIT :before increment
> open_fd_count18446744073709551615
> [work-111] fsal_open :FSAL :CRIT :after increment open_fd_count0
> [cache_lru] lru_run :INODE LRU :EVENT :Re-enabling FD cache.
> [work-111] fsal_close :FSAL :CRIT :before decrement open_fd_count0
> [work-111] fsal_close :FSAL :CRIT :after decrement
> open_fd_count18446744073709551615
>
> -bharat
>
> On Sun, Feb 11, 2018 at 10:32 PM, Frank Filz <ffilz...@mindspring.com>
> wrote:
>
>> Yea, open_fd_count is broken…
>>
>>
>>
>> We have been working on the right way to fix it.
>>
>>
>>
>> Frank
>>
>>
>>
>> *From:* bharat singh [mailto:bharat064...@gmail.com]
>> *Sent:* Saturday, February 10, 2018 7:42 PM
>> *To:* Malahal Naineni <mala...@gmail.com>
>> *Cc:* nfs-ganesha-devel@lists.sourceforge.net
>> *Subject:* Re: [Nfs-ganesha-devel] Ganesha V2.5.2: mdcache high
>> open_fd_count
>>
>>
>>
>> Hey,
>>
>>
>>
>> I think there is a leak in open_fd_count.
>>
>>
>>
>> fsal_rdwr() uses fsal_open() to open the file, but uses
>> obj->obj_ops.close(obj) to close the file and there is no decrement of
>> open_fd_count.
>>
>> So this counter keeps increasing and I could easily hit the 4k hard limit
>> with prolonged read/writes.
>>
>>
>>
>> I changed it to use fsal_close() as it also does the decrement. After
>> this change the open_fd_count was looking OK.
>>
>> But recently I saw open_fd_count being underflown to
>> open_fd_count=18446744073709551615
>>
>>
>>
>> So i am suspecting a double close. Any suggestions ?
>>
>>
>>
>>  Code snippet from // V2.5-stable/src/FSAL/fsal_helper.c
>>
>> fsal_status_t fsal_rdwr(struct fsal_obj_handle *obj,
>>
>>                             fsal_io_direction_t io_direction,
>>
>>                             uint64_t offset, size_t io_size,
>>
>>                             size_t *bytes_moved, void *buffer,
>>
>>                             bool *eof,
>>
>>                             bool *sync, struct io_info *info)
>>
>> {
>>
>> ...
>>
>>           loflags = obj->obj_ops.status(obj);
>>
>>           while ((!fsal_is_open(obj))
>>
>>                  || (loflags && loflags != FSAL_O_RDWR && loflags !=
>> openflags)) {
>>
>>                       loflags = obj->obj_ops.status(obj);
>>
>>                       if ((!fsal_is_open(obj))
>>
>>                           || (loflags && loflags != FSAL_O_RDWR
>>
>>                                   && loflags != openflags)) {
>>
>>                                   fsal_status = fsal_open(obj, openflags);
>>
>>                                   if (FSAL_IS_ERROR(fsal_status))
>>
>>                                               goto out;
>>
>>                                   opened = true;
>>
>>                       }
>>
>>                       loflags = obj->obj_ops.status(obj);
>>
>>           }
>>
>> ..
>>
>>                       if ((fsal_status.major != ERR_FSAL_NOT_OPENED)
>>
>>                           && (obj->obj_ops.status(obj) != FSAL_O_CLOSED))
>> {
>>
>>                                   LogFullDebug(COMPONENT_FSAL,
>>
>>                                                    "fsal_rdwr_plus:
>> CLOSING file %p",
>>
>>                                                    obj);
>>
>>
>>
>>                                   fsal_status = obj->obj_ops.close(obj);
>>  >>>>>>>> using fsal_close here ?
>>
>>                                   if (FSAL_IS_ERROR(fsal_status)) {
>>
>>                                               LogCrit(COMPONENT_FSAL,
>>
>>                                                           "Error closing
>> file in fsal_rdwr_plus: %s.",
>>
>>
>> fsal_err_txt(fsal_status));
>>
>>                                   }
>>
>>                       }
>>
>> ...
>>
>>           if (opened) {
>>
>>                       fsal_status = obj->obj_ops.close(obj);    >>>>>>>>
>> using fsal_close here ?
>>
>>                       if (FSAL_IS_ERROR(fsal_status)) {
>>
>>                                   LogEvent(COMPONENT_FSAL,
>>
>>                                               "fsal_rdwr_plus: close =
>> %s",
>>
>>                                               fsal_err_txt(fsal_status));
>>
>>                                   goto out;
>>
>>                       }
>>
>>           }
>>
>> ...
>>
>> }
>>
>>
>>
>>
>>
>> On Tue, Jan 2, 2018 at 12:30 AM, Malahal Naineni <mala...@gmail.com>
>> wrote:
>>
>> The links I gave you will have everything you need. You should be able to
>> download gerrit reviews by "git review -d <number>" or download from the
>> gerrit web gui.
>>
>>
>>
>> "390496" is merged upstream, but the other one is not merged yet.
>>
>>
>>
>> $ git log --oneline --grep='Fix closing global file descriptors'
>> origin/next
>>
>> 5c2efa8f0 Fix closing global file descriptors
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jan 2, 2018 at 3:22 AM, bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>> Thanks Malahal
>>
>>
>>
>> Can you point me to these issues/fixes. I will try to patch V2.5-stable
>> and run my tests.
>>
>>
>>
>> Thanks,
>>
>> Bharat
>>
>>
>>
>> On Mon, Jan 1, 2018 at 10:20 AM, Malahal Naineni <mala...@gmail.com>
>> wrote:
>>
>> >> I see that mdcache keeps growing beyond the high water mark and lru
>> reclamation can’t keep up.
>>
>>
>>
>> mdcache is different from "FD" cache. I don't think we found an issue
>> with mdcache itself. We found couple of issues with "FD cache"
>>
>>
>>
>> 1) https://review.gerrithub.io/#/c/391266/
>>
>> 2) https://review.gerrithub.io/#/c/390496/
>>
>>
>>
>> Neither of them are in V2.5-stable at this point. We will have to
>> backport these and others soon.
>>
>>
>>
>> Regards, Malahal.
>>
>>
>>
>> On Mon, Jan 1, 2018 at 11:04 PM, bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>> Adding nfs-ganesha-support..
>>
>>
>>
>>
>>
>> On Fri, Dec 29, 2017 at 11:01 AM, bharat singh <bharat064...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>>
>>
>> I am testing NFSv3 Ganesha implementation against nfstest_io tool. I see
>> that mdcache keeps growing beyond the high water mark and lru
>> reclamation can’t keep up.
>>
>>
>>
>> [cache_lru] lru_run :INODE LRU :CRIT :Futility count exceeded.  The LRU
>> thread is unable to make progress in reclaiming FDs.  Disabling FD cache.
>>
>> mdcache_lru_fds_available :INODE LRU :INFO :FDs above high water mark,
>> waking LRU thread. open_fd_count=14196, lru_state.fds_hiwat=3686,
>> lru_state.fds_lowat=2048, lru_state.fds_hard_limit=4055
>>
>>
>>
>> I am on Ganesha V2.5.2 with default config settings
>>
>>
>>
>> So couple of questions:
>>
>> 1. Is Ganesha tested against these kind of tools, which does a bunch of
>> open/close in quick successions.
>>
>> 2. Is there a way to suppress these error messages and/or expedite the
>> lru reclamation process.
>>
>> 3. Any suggestions regarding the usage of these kind of tools with
>> Ganesha.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Bharat
>>
>>
>>
>>
>>
>> --
>>
>> -Bharat
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Nfs-ganesha-devel mailing list
>> Nfs-ganesha-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> -Bharat
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> -Bharat
>>
>>
>>
>>
>>
>>
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon>
>>  Virus-free.
>> www.avast.com
>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>
>> <#m_311239901911670444_m_3830110772624612959_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>
>
>
> --
> -Bharat
>
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to