Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

Daniel Gryniewicz Wed, 04 Apr 2018 09:51:29 -0700

Okay, thanks. That confirms to me that we need to do something else.I'll start to look into this ASAP.


Daniel


On 04/04/2018 12:37 PM, Pradeep wrote:

Hi Daniel,

I tried increasing lanes to 1023. The usage looks better, but still overthe limit:

$2 = {entries_hiwat = 100000, entries_used = 299838, chunks_hiwat =100000, chunks_used = 1235, fds_system_imposed = 1048576, fds_hard_limit = 1038090, fds_hiwat = 943718, fds_lowat = 524288,futility = 0, per_lane_work = 50, biggest_window = 419430,

   prev_fd_count = 39434, prev_time = 1522775283, caching_fds = true}

I'm trying to simulate build workload by running SpecFS SWBUILDworkload. This is with Ganesha 2.7 and FSAL_VFS. The server has4CPU/12GB Memory.

For build 8 (40 processes), the latency increased from 5ms (with 17lanes) to 22 ms (with 1023 lanes) and the test failed to achieverequired IOPs.


Thanks,
Pradeep

On Tue, Apr 3, 2018 at 7:58 AM, Pradeep <pradeeptho...@gmail.com<mailto:pradeeptho...@gmail.com>> wrote:


    Hi Daniel,

    Sure I will try that.

    One thing I tried is to not allocate new entries and return
    NFS4ERR_DELAY in the hope that the increased refcnt at LRU is
    temporary. This worked for some time; but then I hit a case where I
    see all the entries at the LRU of L1 has a refcnt of 2 and the
    subsequent entries have a refcnt of 1. All L2's were empty. I realized
    that whenever a new entry is created, the refcnt is 2 and it is put at
    the LRU. Also promotions from L2 moves them to LRU of L1. So it is
    likely that many threads may end up finding no entries at LRU and end
    allocating new entries.

    Then I tried another experiment: Invoke lru_wake_thread() when the
    number of entries is greater than entries_hiwat; but still allocate a
    new entry for the current thread. This worked. I had to make a change
    in lru_run() to allow demotion in case of 'entries > entries_hiwat' in
    addition to max FD check. The side effect would be that it will close
    FDs and demote to L2. Almost all of these FDs are opened in the
    context of setattr/getattr; so attributes are already in cache and FDs
    are probably useless until the cache expires.  I think your idea of
    moving further down the lane may be a better approach.

    I will try your suggestion next. With 1023 lanes, it is unlikely that
    all lanes will have an active entry.

    Thanks,
    Pradeep

    On 4/3/18, Daniel Gryniewicz <d...@redhat.com
    <mailto:d...@redhat.com>> wrote:
     > So, the way this is supposed to work is that getting a ref when
    the ref
     > is 1 is always an LRU_REQ_INITIAL ref, so that moves it to the
    MRU.  At
     > that point, further refs don't move it around in the queue, just
     > increment the refcount.  This should be the case, because
     > mdcache_new_entry() and mdcache_find_keyed() both get an INITIAL ref,
     > and all other refs require you to already have a pointer to the entry
     > (and therefore a ref).
     >
     > Can you try something, since you have a reproducer?  It seems
    that, with
     > 1.7 million files, 17 lanes may be a bit low.  Can you try with
     > something ridiculously large, like 1023, and see if that makes a
     > difference?
     >
     > I suspect we'll have to add logic to move further down the lanes if
     > futility hits.
     >
     > Daniel
     >
     > On 04/02/2018 12:30 PM, Pradeep wrote:
     >> We discussed this a while ago. I'm running into this again with
    2.6.0.
     >> Here is a snapshot of the lru_state (I set the max entries to 10):
     >>
     >> {entries_hiwat = 200000, entries_used = 1772870, chunks_hiwat =
    100000,
     >> chunks_used = 16371, lru_reap_l1 = 8116842,
     >>    lru_reap_l2 = 1637334, lru_reap_failed = 1637334,
    attr_from_cache =
     >> 31917512, attr_from_cache_for_client = 5975849,
     >>    fds_system_imposed = 1048576, fds_hard_limit = 1038090,
    fds_hiwat =
     >> 943718, fds_lowat = 524288, futility = 0, per_lane_work = 50,
     >>    biggest_window = 419430, prev_fd_count = 0, prev_time =
    1522647830,
     >> caching_fds = true}
     >>
     >> As you can see it has grown well beyond the limlt set (1.7
    million vs
     >> 200K max size). lru_reap_failed indicates number of times the reap
     >> failed from L1 and L2.
     >> I'm wondering what can cause the reap to fail once it reaches a
    steady
     >> state. It appears to me that the entry at LRU (head of the queue) is
     >> actually being used (refcnt > 1) and there are entries in the
    queue with
     >> refcnt == 1. But those are not being looked at. My understanding
    is that
     >> if an entry is accessed, it must move to MRU (tail of the
    queue). Any
     >> idea why the entry at LRU can have a refcnt > 1?
     >>
     >> This can happen if the refcnt is incremented without QLOCK and if
     >> lru_reap_impl() is called at the same time from another thread,
    it will
     >> skip the first entry and return NULL. This was done
     >> in _mdcache_lru_ref() which could cause the refcnt on the head
    of the
     >> queue to be incremented while some other thread looks at it
    holding a
     >> QLOCK. I tried moving the increment/dequeue
    in _mdcache_lru_ref() inside
     >> QLOCK; but that did not help.
     >>
     >> Also if "get_ref()" is called for the entry at the LRU for some
    reason,
     >> it will just increment refcnt and return. I think the assumption
    is that
     >> by the time "get_ref() is called, the entry is supposed to be
    out of LRU.
     >>
     >>
     >> Thanks,
     >> Pradeep
     >>
     >



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

Reply via email to