Re: [Nfs-ganesha-devel] Multiple-fd work

Frank Filz Tue, 28 Jul 2015 09:52:55 -0700

> Hi Frank, list.
> 
> Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700:
> > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has the
> > same issues as FSAL_VFS.
> 
> LUSTRE would actually like one fd per open or equivalent if we can reap them
> efficiently, for the same reason as GPFS -- sharing a fd kills readahead
> behaviour and lustre needs the client to do readahead and has us disable
> readahead disk-side, so ganesha performs quite worse with multiple clients.
> 
> Well, I guess that's close enough to what we're doing anyway.


Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be closed 
appropriately. If the FSAL didn't need to keep them open, we could enhance the 
cache inode LRU to be able to consider pinned entries for advisory file close. 
The FSAL could also try to do something to keep track and I dunno, LRU garbage 
collection of idle file descriptors might actually belong in the FSAL anyway 
instead of cache inode.

> > Now I don't know about other FSALs, though I wouldn't be surprised if
> > at least some other FSALs might benefit from some mechanism to
> > associate SOMETHING with each lock owner per file.
> >
> > So I came up with the idea of the FSAL providing the size of the "thing"
> > (which I have currently named fsal_fd, but we can change the name if
> > it would make folks feel more comfortable) with each Ganesha stateid.
> > And then to bring NFS v3 locks and share reservations into the
> > picture, I've made them able to use stateids (well, state_t
> > structures), which also cleaned up part of the SAL lock interface
> > (since NFS v3 can hide it's "state" value in the state_t and stop 
> > overloading
> state_t *state...
> 
> 9P has a state_t per fid (which will translate into per open), so that should
> work well for us as well.

Yep, that's why I added state_t to 9P. There's actually a complication when an 
FSAL actually needs both share_fds and lock_fds. I will be looking into that 
some more.

> > Now each FSAL gets to define exactly what an fsal_fd actually is. At
> > the moment I have the open mode in the generic structure, but maybe it
> > can move into the FSAL private fsal_fd.
> >
> > Not all FSALs will use both open and lock fds, and we could provide
> > separate sizes for them so space would only be allocated when
> > necessary (and if zero, a NULL pointer is passed).
> 
> sounds good.
> 
> > For most operations, the object_handle, and both a share_fd and
> > lock_fd are passed. This allows the FSAL to decide exactly which ones
> > it needs (if it needs a generic "thing" for anonymous I/O it can stash
> > a generic fsal_fd in it's object_handle).
> 
> They're passed with what we have and it's left to the FSAL function to open if
> it needs something?
> One of the problem we have at the moment in VFS is fstat() racing on the
> "generic thing", which can be closed/open in other threads.
> Do we leave it up to FSALs to protect their generic bits then ? (since we 
> can't
> know which FSAL call will use which bit, doing locking ourselves will have to
> be too much for most)

Note the FSAL_VFS fstat race has been resolved (by protecting the fd inside the 
FSAL - I plan to continue this migration, allow the FSAL to protect it's file 
descriptors or whatever they are). And yes, it's up to the FSAL when it 
actually needs to open something.

Note that lock_fds can get left open longer than necessary, particularly in NFS 
v4.0 since there isn't a way for the client to indicate it's done with the lock 
stateid (for NFS v3 the CURRENT code will result in a close of the lock_fd 
anytime the lock owner drops to zero locks on a file - I think we may want to 
proactively allow for keeping the state_t around a bit longer and thus allow 
the FSAL the option of also keeping the fd (or whatever) open longer).

> > The benefit of this interface is that the FSAL can leverage cache
> > inode AVL tree and SAL hash tables without making upcalls (that would
> > be fraught with locking issues) or duplicating hash tables in order to
> > store information entirely separately. It also may open possibilities
> > to better hinting between the layers so garbage collection can be
> improved.
> 
> yay. (just need to make sure we "close" anything that has been left open on
> GC, so need a common level flag to say it's used maybe?)

The fsal_fd that are attached to state_t have a pretty defined lifetime. The 
global fsal_fd attached to the object_handle already has a reasonable mechanism 
in place for managing it's lifetime, though we might want to move that 
management into the FSAL as mentioned above.

> > In the meantime, the legacy interface is still available. If truly
> > some FSALs will never need this function, but they want to benefit
> > from the atomic open/create/setattr capability of FSAL open_fd method,
> > we can make a more generic version of that (well, actually, it just
> > needs to be ok having NULL passed for the fsal_fd).
> 
> I think we should make FSALs so that it's always OK whatever is given, but
> that's a bit more work (esp. internal locking for generic fd as mentioned
> previously)

The goal would actually be to eliminate the legacy API. If this proposed API 
really doesn't fit some particular FSAL or FSALs, we can consider the best way 
to converge to a reasonable API (even if for some things, there are two 
families of calls).

> Looks like a good draft anyway, I'm sure we can work the details.

Details have changed as I'm moved from my initial API definition to 
implementation... They will change some more...

Frank


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] Multiple-fd work

Reply via email to