Just looking at the code I don't see where you retry the lock request from 
the FSAL which is required to add it back to the FSAL queue.
Am I missing something?
Marc.



From:   "Frank Filz" <ffilz...@mindspring.com>
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     "'nfs-ganesha-devel'" <nfs-ganesha-devel@lists.sourceforge.net>
Date:   09/07/2016 03:57 PM
Subject:        RE: [Nfs-ganesha-devel] NLM async locking



Marc,

Could you try the top commit in this branch:

https://github.com/ffilz/nfs-ganesha/commits/async

It may not be the complete solution, but I think it will help your 
scenario.

I need to do more work on async blocking locks...

And it looks like without async blocking lock support, Ganesha doesn't
handle the case where a lock blocks on a conflicting lock from outside the
Ganesha instance. I will be looking at implementing my thread pool idea 
that
I modeled in the multilock tool.

Frank

> -----Original Message-----
> From: Frank Filz [mailto:ffilz...@mindspring.com]
> Sent: Wednesday, September 7, 2016 9:42 AM
> To: 'Marc Eshel' <es...@us.ibm.com>
> Cc: 'nfs-ganesha-devel' <nfs-ganesha-devel@lists.sourceforge.net>
> Subject: Re: [Nfs-ganesha-devel] NLM async locking
> 
> Ok, I'm not sure this ever worked right...
> 
> With the lock available upcall, we never put the lock back on the 
blocked
lock
> list if an attempt to acquire the lock from the FSAL fails...
> 
> So the way the lock available upcall is supposed to work:
> 
> Client requests conflicting lock
> Blocked lock gets registered by FSAL
> SAL puts lock on blocked lock list
> Time passes
> FSAL makes lock available upcall
> SAL finds the blocked lock entry in the blocked lock list SAL makes a 
call
to
> FSAL to attempt to acquire the lock Assume that fails (in the example,
> because multiple conflicting locks got
> notified)
> SAL puts the lock BACK on the blocked lock list (this step is missing) 
and
all is
> well....
> Time passes
> FSAL makes lock available upcall
> SAL finds the blocked lock entry in the blocked lock list SAL makes a 
call
to
> FSAL to attempt to acquire the lock Lock is granted by FSAL SAL makes
async
> call back to client If THAT fails, SAL releases the lock from the FSAL 
and
> disposes of the lock entry and all is well If THAT succeeds, the lock is
> completely granted and all is well
> 
> I also see that if the client retries the lock before it is granted, we
don't
> remove the lock entry from the blocked lock list... I don't think that
will ever
> cause a problem but we should clean that up also...
> 
> Let me try a patch to fix...
> 
> Frank
> 
> > -----Original Message-----
> > From: Marc Eshel [mailto:es...@us.ibm.com]
> > Sent: Tuesday, September 6, 2016 9:34 PM
> > To: Frank Filz <ffilz...@mindspring.com>
> > Cc: 'nfs-ganesha-devel' <nfs-ganesha-devel@lists.sourceforge.net>
> > Subject: RE: NLM async locking
> >
> > Did you get a chance to look at this problem?
> > Marc.
> >
> >
> >
> > From:   "Frank Filz" <ffilz...@mindspring.com>
> > To:     Marc Eshel/Almaden/IBM@IBMUS
> > Cc:     "'nfs-ganesha-devel'" 
<nfs-ganesha-devel@lists.sourceforge.net>
> > Date:   08/29/2016 02:37 PM
> > Subject:        RE: NLM async locking
> >
> >
> >
> > > I see the following failure:
> > > 1. Get conflicting locks from 3 clients
> > >     cli 1 gets 0-100
> > >     cli 2 is blocked on 0-1000
> > >     cli 3 is blocked on 0-10000
> > > 2. cli 1 unlocks
> > >     up-call for cli 2 and 3 to retry
> > >     cli 2 gets 0-1000
> > >     cli 3 is blocked on 0-1000
> > > 3. cli 2 unlocks
> > >     up-call for cli 3 but Ganesha fails
> > >
> > >         /* We must be out of sync with FSAL, this is fatal */
> > >         LogLockDesc(COMPONENT_STATE, NIV_MAJ, "Blocked Lock Not
> > > Found for"
> > > ,
> > >                     obj, owner, lock);
> > >         LogFatal(COMPONENT_STATE, "Locks out of sync with FSAL");
> > >
> > > I think the problem is in step 2, after cli 3 failed for the second
> > > time
> > it is not
> > > put back in queue, the sbd_list.
> > >
> > > Can you please confirm this logic is very complicated.
> >
> > That sounds like a likely problem. I'd have to dig into the code to
> > see
> why...
> > May take me a day or two to investigate.
> >
> > Frank
> >
> >
> >
> > ---
> > This email has been checked for viruses by Avast antivirus software.
> > https://www.avast.com/antivirus
> >
> >
> >
> 
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
>
----------------------------------------------------------------------------
--
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus






------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to