Frank is correct.  The CLEANUP state was used because state had a weak 
reference to entries, so those entries could not actually go away until 
all the state was cleaned up.  However, they also couldn't be used for 
anything, because the underlying FSAL had indicated the object was gone. 
  This made accessing them during shutdown or unexport difficult.

MDCACHE got rid of one of the two types of refcounts that CACHE_INODE 
had, simplifying refcounting, but initially left the weak reference in 
state.

What this patch, the weak reference was removed, switching to a strong 
reference, and just depending on the refcounts to work properly.  This 
also removed the CLEANUP state necessity, therefore getting rid of that 
ESTALE return.  This patch was motivated by the exact problem you're 
seeing, infinite loop on shutdown/unexport.

Daniel

On 04/12/2017 02:48 PM, Madhu P Punjabi wrote:
> Hi Daniel,
>
> Need help to understand the change in function "_mdcache_lru_ref"
> committed with this patch.
>
> Recently we hit upon a issue with ganesha 2.3 code, where when doing an
> 'unexport' the thread was not coming out of the 'while' loop in function
> 'cache_inode_unexport'. The 'cache_inode_unexport' called
> 'cache_inode_lru_ref' which always returned 'CACHE_INODE_ESTALE' as the
> 'lru->qid=LRU_ENTRY_CLEANUP'.
>
> As 'CACHE_INODE_ESTALE' was returned to 'cache_inode_unexport' it did
> 'continue;' in the while loop where after continuing the 'while' loop
> continued working on the same cache entry. With this the thread was
> stuck in the while loop for more than 15 minutes and inside 'while' the
> 'cache_inode_lru_ref' was called again and again for the same entry.
>
> I saw somewhat similar kind of code existed in 2.5 uptil V2.5-dev-15,
> where '_mdcache_lru_ref' returned 'ERR_FSAL_STALE' for condition
> 'lru->flags & LRU_CLEANUP' and 'while' in 'mdcache_unexport' continued
> working with the same cache entry.
>
> I checked with Frank on IRC, as to why we we return ESTALE for entries
> that are in CLEANUP queue, he suggested that the cleanup queue is for
> entries that are being killed because an ESTALE came up from the FSAL.
> He also commented that the while loop looks wrong and the move from
> cache_inode to mdcache has probably changed how that works, so it
> shouldn't be an issue in 2.4+.
>
> I came across this recent commit which removed the code which was
> returning STALE error from '_mdcache_lru_ref'. I would like to
> understand why this code was removed.Please clarify. Thank you.
>
> Thanks,
> Madhu Thorat.
>
>
> Inactive hide details for GerritHub ---02/23/2017 12:03:36 AM--->From
> Daniel Gryniewicz <d...@redhat.com>: Daniel Gryniewicz haGerritHub
> ---02/23/2017 12:03:36 AM--->From Daniel Gryniewicz <d...@redhat.com>:
> Daniel Gryniewicz has uploaded a new change for review. (
>
> From: GerritHub <supp...@gerritforge.com>
> To: Frank Filz <ffilz...@mindspring.com>
> Date: 02/23/2017 12:03 AM
> Subject: [Nfs-ganesha-devel] Change in ffilz/nfs-ganesha[next]: SAL -
> Keep a pointer to the obj owning the state
>
> ------------------------------------------------------------------------
>
>
>
> Daniel Gryniewicz uploaded this change for *review*.
>
> _View Change_ <https://review.gerrithub.io/350056>
>
> SAL - Keep a pointer to the obj owning the state
>
> State keeps a refcount on the obj, and looking up via weak reference can
> cause issues when the underlying file has been deleted.  Insteah, just
> keep a pointer to the obj, since MDCACHE guaranteed the pointer is
> stable.
>
> Change-Id: Ia7c327bc694f7be21bc5a9802abfefc3f697d133
> Signed-off-by: Daniel Gryniewicz <d...@redhat.com>
> ---
> M src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_lru.c
> M src/SAL/nfs4_state.c
> M src/SAL/nfs4_state_id.c
> M src/SAL/nlm_state.c
> M src/include/sal_data.h
> M src/include/sal_functions.h
> 6 files changed, 27 insertions(+), 88 deletions(-)
>
>
> git pull ssh://review.gerrithub.io:29419/ffilz/nfs-ganesha
> refs/changes/56/350056/1
>
> To view, visit _this change_ <https://review.gerrithub.io/350056>. To
> unsubscribe, visit _settings_ <https://review.gerrithub.io/settings>.
>
> Gerrit-MessageType: newchange
> Gerrit-Change-Id: Ia7c327bc694f7be21bc5a9802abfefc3f697d133
> Gerrit-Change-Number: 350056
> Gerrit-PatchSet: 1
> Gerrit-Project: ffilz/nfs-ganesha
> Gerrit-Branch: next
> Gerrit-Owner: Daniel Gryniewicz
> <d...@redhat.com>------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org!
> http://sdm.link/slashdot_______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
>
>


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to