Re: [Nfs-ganesha-devel] Multiple-fd work

Matt W. Benjamin Tue, 28 Jul 2015 10:37:51 -0700

The problem is that "fd"s are an internal detail that we do not want
leaking out into the general server.


There are several reasons for this adduced in my and Adam's comments
in gerrit.

I may be missing something, but what is the generic (non-VFS or few
other FSAL-specific) concept that is being represented here?  Is it
an OPEN complex?

Matt

----- "Frank Filz" <ffilz...@mindspring.com> wrote:

> > Hi Frank, list.
> > 
> > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700:
> > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has
> the
> > > same issues as FSAL_VFS.
> > 
> > LUSTRE would actually like one fd per open or equivalent if we can
> reap them
> > efficiently, for the same reason as GPFS -- sharing a fd kills
> readahead
> > behaviour and lustre needs the client to do readahead and has us
> disable
> > readahead disk-side, so ganesha performs quite worse with multiple
> clients.
> > 
> > Well, I guess that's close enough to what we're doing anyway.
> 
> Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be closed
> appropriately. If the FSAL didn't need to keep them open, we could
> enhance the cache inode LRU to be able to consider pinned entries for
> advisory file close. The FSAL could also try to do something to keep
> track and I dunno, LRU garbage collection of idle file descriptors
> might actually belong in the FSAL anyway instead of cache inode.
> 
> > > Now I don't know about other FSALs, though I wouldn't be surprised
> if
> > > at least some other FSALs might benefit from some mechanism to
> > > associate SOMETHING with each lock owner per file.
> > >
> > > So I came up with the idea of the FSAL providing the size of the
> "thing"
> > > (which I have currently named fsal_fd, but we can change the name
> if
> > > it would make folks feel more comfortable) with each Ganesha
> stateid.
> > > And then to bring NFS v3 locks and share reservations into the
> > > picture, I've made them able to use stateids (well, state_t
> > > structures), which also cleaned up part of the SAL lock interface
> > > (since NFS v3 can hide it's "state" value in the state_t and stop
> overloading
> > state_t *state...
> > 
> > 9P has a state_t per fid (which will translate into per open), so
> that should
> > work well for us as well.
> 
> Yep, that's why I added state_t to 9P. There's actually a complication
> when an FSAL actually needs both share_fds and lock_fds. I will be
> looking into that some more.
> 
> > > Now each FSAL gets to define exactly what an fsal_fd actually is.
> At
> > > the moment I have the open mode in the generic structure, but
> maybe it
> > > can move into the FSAL private fsal_fd.
> > >
> > > Not all FSALs will use both open and lock fds, and we could
> provide
> > > separate sizes for them so space would only be allocated when
> > > necessary (and if zero, a NULL pointer is passed).
> > 
> > sounds good.
> > 
> > > For most operations, the object_handle, and both a share_fd and
> > > lock_fd are passed. This allows the FSAL to decide exactly which
> ones
> > > it needs (if it needs a generic "thing" for anonymous I/O it can
> stash
> > > a generic fsal_fd in it's object_handle).
> > 
> > They're passed with what we have and it's left to the FSAL function
> to open if
> > it needs something?
> > One of the problem we have at the moment in VFS is fstat() racing on
> the
> > "generic thing", which can be closed/open in other threads.
> > Do we leave it up to FSALs to protect their generic bits then ?
> (since we can't
> > know which FSAL call will use which bit, doing locking ourselves
> will have to
> > be too much for most)
> 
> Note the FSAL_VFS fstat race has been resolved (by protecting the fd
> inside the FSAL - I plan to continue this migration, allow the FSAL to
> protect it's file descriptors or whatever they are). And yes, it's up
> to the FSAL when it actually needs to open something.
> 
> Note that lock_fds can get left open longer than necessary,
> particularly in NFS v4.0 since there isn't a way for the client to
> indicate it's done with the lock stateid (for NFS v3 the CURRENT code
> will result in a close of the lock_fd anytime the lock owner drops to
> zero locks on a file - I think we may want to proactively allow for
> keeping the state_t around a bit longer and thus allow the FSAL the
> option of also keeping the fd (or whatever) open longer).
> 
> > > The benefit of this interface is that the FSAL can leverage cache
> > > inode AVL tree and SAL hash tables without making upcalls (that
> would
> > > be fraught with locking issues) or duplicating hash tables in
> order to
> > > store information entirely separately. It also may open
> possibilities
> > > to better hinting between the layers so garbage collection can be
> > improved.
> > 
> > yay. (just need to make sure we "close" anything that has been left
> open on
> > GC, so need a common level flag to say it's used maybe?)
> 
> The fsal_fd that are attached to state_t have a pretty defined
> lifetime. The global fsal_fd attached to the object_handle already has
> a reasonable mechanism in place for managing it's lifetime, though we
> might want to move that management into the FSAL as mentioned above.
> 
> > > In the meantime, the legacy interface is still available. If
> truly
> > > some FSALs will never need this function, but they want to
> benefit
> > > from the atomic open/create/setattr capability of FSAL open_fd
> method,
> > > we can make a more generic version of that (well, actually, it
> just
> > > needs to be ok having NULL passed for the fsal_fd).
> > 
> > I think we should make FSALs so that it's always OK whatever is
> given, but
> > that's a bit more work (esp. internal locking for generic fd as
> mentioned
> > previously)
> 
> The goal would actually be to eliminate the legacy API. If this
> proposed API really doesn't fit some particular FSAL or FSALs, we can
> consider the best way to converge to a reasonable API (even if for
> some things, there are two families of calls).
> 
> > Looks like a good draft anyway, I'm sure we can work the details.
> 
> Details have changed as I'm moved from my initial API definition to
> implementation... They will change some more...
> 
> Frank
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

-- 
Matt Benjamin
CohortFS, LLC.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://cohortfs.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 

------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] Multiple-fd work

Reply via email to