> The problem is that "fd"s are an internal detail that we do not want leaking
> out into the general server.
> 
> There are several reasons for this adduced in my and Adam's comments in
> gerrit.
> 
> I may be missing something, but what is the generic (non-VFS or few other
> FSAL-specific) concept that is being represented here?  Is it an OPEN
> complex?

>From what I can tell, this is useful to more than just a few FSALs. Now maybe 
>"file descriptor" is not the best choice (since FSAL_PROXY is going to use the 
>"thing" to stash information about stateids from the upstream server (which 
>will be different though perhaps related to Ganesha's stateids)).

It's context to associate with share reservations (NFS 4 OPEN, NLM SHARE) and 
lock state. Different FSALs will have different context, but perhaps not all 
FSALs will need the context.

But having a "few" FSALs need an API also isn't a good reason to turn down that 
API.

I am working to make this "thing" as generic as makes sense. If there's some 
FSAL that has needs that I haven't considered, then please share those needs 
rather than shooting down ideas.

Frank

> > > Hi Frank, list.
> > >
> > > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700:
> > > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has
> > the
> > > > same issues as FSAL_VFS.
> > >
> > > LUSTRE would actually like one fd per open or equivalent if we can
> > reap them
> > > efficiently, for the same reason as GPFS -- sharing a fd kills
> > readahead
> > > behaviour and lustre needs the client to do readahead and has us
> > disable
> > > readahead disk-side, so ganesha performs quite worse with multiple
> > clients.
> > >
> > > Well, I guess that's close enough to what we're doing anyway.
> >
> > Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be closed
> > appropriately. If the FSAL didn't need to keep them open, we could
> > enhance the cache inode LRU to be able to consider pinned entries for
> > advisory file close. The FSAL could also try to do something to keep
> > track and I dunno, LRU garbage collection of idle file descriptors
> > might actually belong in the FSAL anyway instead of cache inode.
> >
> > > > Now I don't know about other FSALs, though I wouldn't be surprised
> > if
> > > > at least some other FSALs might benefit from some mechanism to
> > > > associate SOMETHING with each lock owner per file.
> > > >
> > > > So I came up with the idea of the FSAL providing the size of the
> > "thing"
> > > > (which I have currently named fsal_fd, but we can change the name
> > if
> > > > it would make folks feel more comfortable) with each Ganesha
> > stateid.
> > > > And then to bring NFS v3 locks and share reservations into the
> > > > picture, I've made them able to use stateids (well, state_t
> > > > structures), which also cleaned up part of the SAL lock interface
> > > > (since NFS v3 can hide it's "state" value in the state_t and stop
> > overloading
> > > state_t *state...
> > >
> > > 9P has a state_t per fid (which will translate into per open), so
> > that should
> > > work well for us as well.
> >
> > Yep, that's why I added state_t to 9P. There's actually a complication
> > when an FSAL actually needs both share_fds and lock_fds. I will be
> > looking into that some more.
> >
> > > > Now each FSAL gets to define exactly what an fsal_fd actually is.
> > At
> > > > the moment I have the open mode in the generic structure, but
> > maybe it
> > > > can move into the FSAL private fsal_fd.
> > > >
> > > > Not all FSALs will use both open and lock fds, and we could
> > provide
> > > > separate sizes for them so space would only be allocated when
> > > > necessary (and if zero, a NULL pointer is passed).
> > >
> > > sounds good.
> > >
> > > > For most operations, the object_handle, and both a share_fd and
> > > > lock_fd are passed. This allows the FSAL to decide exactly which
> > ones
> > > > it needs (if it needs a generic "thing" for anonymous I/O it can
> > stash
> > > > a generic fsal_fd in it's object_handle).
> > >
> > > They're passed with what we have and it's left to the FSAL function
> > to open if
> > > it needs something?
> > > One of the problem we have at the moment in VFS is fstat() racing on
> > the
> > > "generic thing", which can be closed/open in other threads.
> > > Do we leave it up to FSALs to protect their generic bits then ?
> > (since we can't
> > > know which FSAL call will use which bit, doing locking ourselves
> > will have to
> > > be too much for most)
> >
> > Note the FSAL_VFS fstat race has been resolved (by protecting the fd
> > inside the FSAL - I plan to continue this migration, allow the FSAL to
> > protect it's file descriptors or whatever they are). And yes, it's up
> > to the FSAL when it actually needs to open something.
> >
> > Note that lock_fds can get left open longer than necessary,
> > particularly in NFS v4.0 since there isn't a way for the client to
> > indicate it's done with the lock stateid (for NFS v3 the CURRENT code
> > will result in a close of the lock_fd anytime the lock owner drops to
> > zero locks on a file - I think we may want to proactively allow for
> > keeping the state_t around a bit longer and thus allow the FSAL the
> > option of also keeping the fd (or whatever) open longer).
> >
> > > > The benefit of this interface is that the FSAL can leverage cache
> > > > inode AVL tree and SAL hash tables without making upcalls (that
> > would
> > > > be fraught with locking issues) or duplicating hash tables in
> > order to
> > > > store information entirely separately. It also may open
> > possibilities
> > > > to better hinting between the layers so garbage collection can be
> > > improved.
> > >
> > > yay. (just need to make sure we "close" anything that has been left
> > open on
> > > GC, so need a common level flag to say it's used maybe?)
> >
> > The fsal_fd that are attached to state_t have a pretty defined
> > lifetime. The global fsal_fd attached to the object_handle already has
> > a reasonable mechanism in place for managing it's lifetime, though we
> > might want to move that management into the FSAL as mentioned above.
> >
> > > > In the meantime, the legacy interface is still available. If
> > truly
> > > > some FSALs will never need this function, but they want to
> > benefit
> > > > from the atomic open/create/setattr capability of FSAL open_fd
> > method,
> > > > we can make a more generic version of that (well, actually, it
> > just
> > > > needs to be ok having NULL passed for the fsal_fd).
> > >
> > > I think we should make FSALs so that it's always OK whatever is
> > given, but
> > > that's a bit more work (esp. internal locking for generic fd as
> > mentioned
> > > previously)
> >
> > The goal would actually be to eliminate the legacy API. If this
> > proposed API really doesn't fit some particular FSAL or FSALs, we can
> > consider the best way to converge to a reasonable API (even if for
> > some things, there are two families of calls).
> >
> > > Looks like a good draft anyway, I'm sure we can work the details.
> >
> > Details have changed as I'm moved from my initial API definition to
> > implementation... They will change some more...
> >
> > Frank
> >
> >
> > ----------------------------------------------------------------------
> > -------- _______________________________________________
> > Nfs-ganesha-devel mailing list
> > Nfs-ganesha-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 
> --
> Matt Benjamin
> CohortFS, LLC.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://cohortfs.com
> 
> tel.  734-761-4689
> fax.  734-769-8938
> cel.  734-216-5309


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to