> > > The problem is that "fd"s are an internal detail that we do not want
> > leaking
> > > out into the general server.
> > >
> > > There are several reasons for this adduced in my and Adam's comments
> > in
> > > gerrit.
> > >
> > > I may be missing something, but what is the generic (non-VFS or few
> > other
> > > FSAL-specific) concept that is being represented here?  Is it an
> > OPEN
> > > complex?
> >
> > From what I can tell, this is useful to more than just a few FSALs.
> > Now maybe "file descriptor" is not the best choice (since FSAL_PROXY
> > is going to use the "thing" to stash information about stateids from
> > the upstream server (which will be different though perhaps related to
> > Ganesha's stateids)).
> 
> I think well-defining the abstraction is key.  Adam didn't intuit that the 
> issue
> was just one of naming, although I agree, I the name "fd"
> may be distracting.
> 
> But I am more concerned about interfaces.  I share Dan's (and perhaps
> Adam's) intuition that we'd like to unify around a smaller number of objects,
> and in making what the general server is doing more transparent to FSAL,
> when that can make for a cleaner interface (we're not trying to hide state
> from FSAL, but we might be interested in avoiding the need to allocate more
> free-standing blobs.  So I liked Dan's suggestion to pass states to FSAL 
> calls.
> Could we talk more about that idea?

I was avoiding the free standing blob (at least for NFS, 9p it was not so 
easy), so for NFS state_t objects, the fsal_fd is allocated along with the 
state_t.

The FSAL has to be able to indicate how much space it needs.

Yes, we COULD pass the state_t (still needs to allocate some space for FSAL 
somehow) instead of the fsal_fd. The question I have is what in there is of 
value to the FSAL. If there really is something of value, then sure, let's pass 
the state_t. Without something of value though, I'm wary of exposing state_t 
stuff to the FSAL (for one thing it's protected by the state_lock, which the 
FSAL normally doesn't have access to).

However, there is another hitch... When doing an open/create, we don't have a 
cache entry or state_t yet (which currently means fsal_fd is actually something 
that needs to be able to be copied).

> > It's context to associate with share reservations (NFS 4 OPEN, NLM
> > SHARE) and lock state. Different FSALs will have different context,
> > but perhaps not all FSALs will need the context.
> >
> > But having a "few" FSALs need an API also isn't a good reason to turn
> > down that API.
> >
> > I am working to make this "thing" as generic as makes sense. If
> > there's some FSAL that has needs that I haven't considered, then
> > please share those needs rather than shooting down ideas.
> 
> I'm very happy to share needs, but I care about implementation decisions,
> too.

Fair enough. I'm sure even as I go through what I am directly implementing 
there will be changes to the API to improve things. And even once it gets 
checked in, there might be a subsequent round of improvements.

I'd love to understand how other FSALs do or don't fit into this concept, and 
how well. Details of needs can only improve the solution.

Thanks

Frank

> > > > > Hi Frank, list.
> > > > >
> > > > > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700:
> > > > > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it
> > has
> > > > the
> > > > > > same issues as FSAL_VFS.
> > > > >
> > > > > LUSTRE would actually like one fd per open or equivalent if we
> > can
> > > > reap them
> > > > > efficiently, for the same reason as GPFS -- sharing a fd kills
> > > > readahead
> > > > > behaviour and lustre needs the client to do readahead and has
> > us
> > > > disable
> > > > > readahead disk-side, so ganesha performs quite worse with
> > multiple
> > > > clients.
> > > > >
> > > > > Well, I guess that's close enough to what we're doing anyway.
> > > >
> > > > Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be
> > closed
> > > > appropriately. If the FSAL didn't need to keep them open, we
> > could
> > > > enhance the cache inode LRU to be able to consider pinned entries
> > for
> > > > advisory file close. The FSAL could also try to do something to
> > keep
> > > > track and I dunno, LRU garbage collection of idle file
> > descriptors
> > > > might actually belong in the FSAL anyway instead of cache inode.
> > > >
> > > > > > Now I don't know about other FSALs, though I wouldn't be
> > surprised
> > > > if
> > > > > > at least some other FSALs might benefit from some mechanism
> > to
> > > > > > associate SOMETHING with each lock owner per file.
> > > > > >
> > > > > > So I came up with the idea of the FSAL providing the size of
> > the
> > > > "thing"
> > > > > > (which I have currently named fsal_fd, but we can change the
> > name
> > > > if
> > > > > > it would make folks feel more comfortable) with each Ganesha
> > > > stateid.
> > > > > > And then to bring NFS v3 locks and share reservations into
> > the
> > > > > > picture, I've made them able to use stateids (well, state_t
> > > > > > structures), which also cleaned up part of the SAL lock
> > interface
> > > > > > (since NFS v3 can hide it's "state" value in the state_t and
> > stop
> > > > overloading
> > > > > state_t *state...
> > > > >
> > > > > 9P has a state_t per fid (which will translate into per open),
> > so
> > > > that should
> > > > > work well for us as well.
> > > >
> > > > Yep, that's why I added state_t to 9P. There's actually a
> > complication
> > > > when an FSAL actually needs both share_fds and lock_fds. I will
> > be
> > > > looking into that some more.
> > > >
> > > > > > Now each FSAL gets to define exactly what an fsal_fd actually
> > is.
> > > > At
> > > > > > the moment I have the open mode in the generic structure, but
> > > > maybe it
> > > > > > can move into the FSAL private fsal_fd.
> > > > > >
> > > > > > Not all FSALs will use both open and lock fds, and we could
> > > > provide
> > > > > > separate sizes for them so space would only be allocated when
> > > > > > necessary (and if zero, a NULL pointer is passed).
> > > > >
> > > > > sounds good.
> > > > >
> > > > > > For most operations, the object_handle, and both a share_fd
> > and
> > > > > > lock_fd are passed. This allows the FSAL to decide exactly
> > which
> > > > ones
> > > > > > it needs (if it needs a generic "thing" for anonymous I/O it
> > can
> > > > stash
> > > > > > a generic fsal_fd in it's object_handle).
> > > > >
> > > > > They're passed with what we have and it's left to the FSAL
> > function
> > > > to open if
> > > > > it needs something?
> > > > > One of the problem we have at the moment in VFS is fstat()
> > racing on
> > > > the
> > > > > "generic thing", which can be closed/open in other threads.
> > > > > Do we leave it up to FSALs to protect their generic bits then ?
> > > > (since we can't
> > > > > know which FSAL call will use which bit, doing locking
> > ourselves
> > > > will have to
> > > > > be too much for most)
> > > >
> > > > Note the FSAL_VFS fstat race has been resolved (by protecting the
> > fd
> > > > inside the FSAL - I plan to continue this migration, allow the
> > FSAL to
> > > > protect it's file descriptors or whatever they are). And yes, it's
> > up
> > > > to the FSAL when it actually needs to open something.
> > > >
> > > > Note that lock_fds can get left open longer than necessary,
> > > > particularly in NFS v4.0 since there isn't a way for the client
> > to
> > > > indicate it's done with the lock stateid (for NFS v3 the CURRENT
> > code
> > > > will result in a close of the lock_fd anytime the lock owner drops
> > to
> > > > zero locks on a file - I think we may want to proactively allow
> > for
> > > > keeping the state_t around a bit longer and thus allow the FSAL
> > the
> > > > option of also keeping the fd (or whatever) open longer).
> > > >
> > > > > > The benefit of this interface is that the FSAL can leverage
> > cache
> > > > > > inode AVL tree and SAL hash tables without making upcalls
> > (that
> > > > would
> > > > > > be fraught with locking issues) or duplicating hash tables in
> > > > order to
> > > > > > store information entirely separately. It also may open
> > > > possibilities
> > > > > > to better hinting between the layers so garbage collection can
> > be
> > > > > improved.
> > > > >
> > > > > yay. (just need to make sure we "close" anything that has been
> > left
> > > > open on
> > > > > GC, so need a common level flag to say it's used maybe?)
> > > >
> > > > The fsal_fd that are attached to state_t have a pretty defined
> > > > lifetime. The global fsal_fd attached to the object_handle already
> > has
> > > > a reasonable mechanism in place for managing it's lifetime, though
> > we
> > > > might want to move that management into the FSAL as mentioned
> > above.
> > > >
> > > > > > In the meantime, the legacy interface is still available. If
> > > > truly
> > > > > > some FSALs will never need this function, but they want to
> > > > benefit
> > > > > > from the atomic open/create/setattr capability of FSAL
> > open_fd
> > > > method,
> > > > > > we can make a more generic version of that (well, actually,
> > it
> > > > just
> > > > > > needs to be ok having NULL passed for the fsal_fd).
> > > > >
> > > > > I think we should make FSALs so that it's always OK whatever is
> > > > given, but
> > > > > that's a bit more work (esp. internal locking for generic fd as
> > > > mentioned
> > > > > previously)
> > > >
> > > > The goal would actually be to eliminate the legacy API. If this
> > > > proposed API really doesn't fit some particular FSAL or FSALs, we
> > can
> > > > consider the best way to converge to a reasonable API (even if
> > for
> > > > some things, there are two families of calls).
> > > >
> > > > > Looks like a good draft anyway, I'm sure we can work the
> > details.
> > > >
> > > > Details have changed as I'm moved from my initial API definition
> > to
> > > > implementation... They will change some more...
> > > >
> > > > Frank
> > > >
> > > >
> > > >
> > ----------------------------------------------------------------------
> > > > -------- _______________________________________________
> > > > Nfs-ganesha-devel mailing list
> > > > Nfs-ganesha-devel@lists.sourceforge.net
> > > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> > >
> > > --
> > > Matt Benjamin
> > > CohortFS, LLC.
> > > 315 West Huron Street, Suite 140A
> > > Ann Arbor, Michigan 48103
> > >
> > > http://cohortfs.com
> > >
> > > tel.  734-761-4689
> > > fax.  734-769-8938
> > > cel.  734-216-5309
> 
> --
> Matt Benjamin
> CohortFS, LLC.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
> 
> http://cohortfs.com
> 
> tel.  734-761-4689
> fax.  734-769-8938
> cel.  734-216-5309


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to