Hi,

----- "Frank Filz" <ffilz...@mindspring.com> wrote:

> > The problem is that "fd"s are an internal detail that we do not want
> leaking
> > out into the general server.
> > 
> > There are several reasons for this adduced in my and Adam's comments
> in
> > gerrit.
> > 
> > I may be missing something, but what is the generic (non-VFS or few
> other
> > FSAL-specific) concept that is being represented here?  Is it an
> OPEN
> > complex?
> 
> From what I can tell, this is useful to more than just a few FSALs.
> Now maybe "file descriptor" is not the best choice (since FSAL_PROXY
> is going to use the "thing" to stash information about stateids from
> the upstream server (which will be different though perhaps related to
> Ganesha's stateids)).

I think well-defining the abstraction is key.  Adam didn't intuit that
the issue was just one of naming, although I agree, I the name "fd"
may be distracting.

But I am more concerned about interfaces.  I share Dan's (and perhaps
Adam's) intuition that we'd like to unify around a smaller number of
objects, and in making what the general server is doing more transparent
to FSAL, when that can make for a cleaner interface (we're not trying
to hide state from FSAL, but we might be interested in avoiding the need
to allocate more free-standing blobs.  So I liked Dan's suggestion to
pass states to FSAL calls.  Could we talk more about that idea?

> 
> It's context to associate with share reservations (NFS 4 OPEN, NLM
> SHARE) and lock state. Different FSALs will have different context,
> but perhaps not all FSALs will need the context.
> 
> But having a "few" FSALs need an API also isn't a good reason to turn
> down that API.
> 
> I am working to make this "thing" as generic as makes sense. If
> there's some FSAL that has needs that I haven't considered, then
> please share those needs rather than shooting down ideas.

I'm very happy to share needs, but I care about implementation decisions,
too.

Matt

> 
> Frank
> 
> > > > Hi Frank, list.
> > > >
> > > > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700:
> > > > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it
> has
> > > the
> > > > > same issues as FSAL_VFS.
> > > >
> > > > LUSTRE would actually like one fd per open or equivalent if we
> can
> > > reap them
> > > > efficiently, for the same reason as GPFS -- sharing a fd kills
> > > readahead
> > > > behaviour and lustre needs the client to do readahead and has
> us
> > > disable
> > > > readahead disk-side, so ganesha performs quite worse with
> multiple
> > > clients.
> > > >
> > > > Well, I guess that's close enough to what we're doing anyway.
> > >
> > > Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be
> closed
> > > appropriately. If the FSAL didn't need to keep them open, we
> could
> > > enhance the cache inode LRU to be able to consider pinned entries
> for
> > > advisory file close. The FSAL could also try to do something to
> keep
> > > track and I dunno, LRU garbage collection of idle file
> descriptors
> > > might actually belong in the FSAL anyway instead of cache inode.
> > >
> > > > > Now I don't know about other FSALs, though I wouldn't be
> surprised
> > > if
> > > > > at least some other FSALs might benefit from some mechanism
> to
> > > > > associate SOMETHING with each lock owner per file.
> > > > >
> > > > > So I came up with the idea of the FSAL providing the size of
> the
> > > "thing"
> > > > > (which I have currently named fsal_fd, but we can change the
> name
> > > if
> > > > > it would make folks feel more comfortable) with each Ganesha
> > > stateid.
> > > > > And then to bring NFS v3 locks and share reservations into
> the
> > > > > picture, I've made them able to use stateids (well, state_t
> > > > > structures), which also cleaned up part of the SAL lock
> interface
> > > > > (since NFS v3 can hide it's "state" value in the state_t and
> stop
> > > overloading
> > > > state_t *state...
> > > >
> > > > 9P has a state_t per fid (which will translate into per open),
> so
> > > that should
> > > > work well for us as well.
> > >
> > > Yep, that's why I added state_t to 9P. There's actually a
> complication
> > > when an FSAL actually needs both share_fds and lock_fds. I will
> be
> > > looking into that some more.
> > >
> > > > > Now each FSAL gets to define exactly what an fsal_fd actually
> is.
> > > At
> > > > > the moment I have the open mode in the generic structure, but
> > > maybe it
> > > > > can move into the FSAL private fsal_fd.
> > > > >
> > > > > Not all FSALs will use both open and lock fds, and we could
> > > provide
> > > > > separate sizes for them so space would only be allocated when
> > > > > necessary (and if zero, a NULL pointer is passed).
> > > >
> > > > sounds good.
> > > >
> > > > > For most operations, the object_handle, and both a share_fd
> and
> > > > > lock_fd are passed. This allows the FSAL to decide exactly
> which
> > > ones
> > > > > it needs (if it needs a generic "thing" for anonymous I/O it
> can
> > > stash
> > > > > a generic fsal_fd in it's object_handle).
> > > >
> > > > They're passed with what we have and it's left to the FSAL
> function
> > > to open if
> > > > it needs something?
> > > > One of the problem we have at the moment in VFS is fstat()
> racing on
> > > the
> > > > "generic thing", which can be closed/open in other threads.
> > > > Do we leave it up to FSALs to protect their generic bits then ?
> > > (since we can't
> > > > know which FSAL call will use which bit, doing locking
> ourselves
> > > will have to
> > > > be too much for most)
> > >
> > > Note the FSAL_VFS fstat race has been resolved (by protecting the
> fd
> > > inside the FSAL - I plan to continue this migration, allow the
> FSAL to
> > > protect it's file descriptors or whatever they are). And yes, it's
> up
> > > to the FSAL when it actually needs to open something.
> > >
> > > Note that lock_fds can get left open longer than necessary,
> > > particularly in NFS v4.0 since there isn't a way for the client
> to
> > > indicate it's done with the lock stateid (for NFS v3 the CURRENT
> code
> > > will result in a close of the lock_fd anytime the lock owner drops
> to
> > > zero locks on a file - I think we may want to proactively allow
> for
> > > keeping the state_t around a bit longer and thus allow the FSAL
> the
> > > option of also keeping the fd (or whatever) open longer).
> > >
> > > > > The benefit of this interface is that the FSAL can leverage
> cache
> > > > > inode AVL tree and SAL hash tables without making upcalls
> (that
> > > would
> > > > > be fraught with locking issues) or duplicating hash tables in
> > > order to
> > > > > store information entirely separately. It also may open
> > > possibilities
> > > > > to better hinting between the layers so garbage collection can
> be
> > > > improved.
> > > >
> > > > yay. (just need to make sure we "close" anything that has been
> left
> > > open on
> > > > GC, so need a common level flag to say it's used maybe?)
> > >
> > > The fsal_fd that are attached to state_t have a pretty defined
> > > lifetime. The global fsal_fd attached to the object_handle already
> has
> > > a reasonable mechanism in place for managing it's lifetime, though
> we
> > > might want to move that management into the FSAL as mentioned
> above.
> > >
> > > > > In the meantime, the legacy interface is still available. If
> > > truly
> > > > > some FSALs will never need this function, but they want to
> > > benefit
> > > > > from the atomic open/create/setattr capability of FSAL
> open_fd
> > > method,
> > > > > we can make a more generic version of that (well, actually,
> it
> > > just
> > > > > needs to be ok having NULL passed for the fsal_fd).
> > > >
> > > > I think we should make FSALs so that it's always OK whatever is
> > > given, but
> > > > that's a bit more work (esp. internal locking for generic fd as
> > > mentioned
> > > > previously)
> > >
> > > The goal would actually be to eliminate the legacy API. If this
> > > proposed API really doesn't fit some particular FSAL or FSALs, we
> can
> > > consider the best way to converge to a reasonable API (even if
> for
> > > some things, there are two families of calls).
> > >
> > > > Looks like a good draft anyway, I'm sure we can work the
> details.
> > >
> > > Details have changed as I'm moved from my initial API definition
> to
> > > implementation... They will change some more...
> > >
> > > Frank
> > >
> > >
> > >
> ----------------------------------------------------------------------
> > > -------- _______________________________________________
> > > Nfs-ganesha-devel mailing list
> > > Nfs-ganesha-devel@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> > 
> > --
> > Matt Benjamin
> > CohortFS, LLC.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> > 
> > http://cohortfs.com
> > 
> > tel.  734-761-4689
> > fax.  734-769-8938
> > cel.  734-216-5309

-- 
Matt Benjamin
CohortFS, LLC.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://cohortfs.com

tel.  734-761-4689 
fax.  734-769-8938 
cel.  734-216-5309 

------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to