> The problem is that "fd"s are an internal detail that we do not want leaking > out into the general server. > > There are several reasons for this adduced in my and Adam's comments in > gerrit. > > I may be missing something, but what is the generic (non-VFS or few other > FSAL-specific) concept that is being represented here? Is it an OPEN > complex?
>From what I can tell, this is useful to more than just a few FSALs. Now maybe >"file descriptor" is not the best choice (since FSAL_PROXY is going to use the >"thing" to stash information about stateids from the upstream server (which >will be different though perhaps related to Ganesha's stateids)). It's context to associate with share reservations (NFS 4 OPEN, NLM SHARE) and lock state. Different FSALs will have different context, but perhaps not all FSALs will need the context. But having a "few" FSALs need an API also isn't a good reason to turn down that API. I am working to make this "thing" as generic as makes sense. If there's some FSAL that has needs that I haven't considered, then please share those needs rather than shooting down ideas. Frank > > > Hi Frank, list. > > > > > > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700: > > > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has > > the > > > > same issues as FSAL_VFS. > > > > > > LUSTRE would actually like one fd per open or equivalent if we can > > reap them > > > efficiently, for the same reason as GPFS -- sharing a fd kills > > readahead > > > behaviour and lustre needs the client to do readahead and has us > > disable > > > readahead disk-side, so ganesha performs quite worse with multiple > > clients. > > > > > > Well, I guess that's close enough to what we're doing anyway. > > > > Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be closed > > appropriately. If the FSAL didn't need to keep them open, we could > > enhance the cache inode LRU to be able to consider pinned entries for > > advisory file close. The FSAL could also try to do something to keep > > track and I dunno, LRU garbage collection of idle file descriptors > > might actually belong in the FSAL anyway instead of cache inode. > > > > > > Now I don't know about other FSALs, though I wouldn't be surprised > > if > > > > at least some other FSALs might benefit from some mechanism to > > > > associate SOMETHING with each lock owner per file. > > > > > > > > So I came up with the idea of the FSAL providing the size of the > > "thing" > > > > (which I have currently named fsal_fd, but we can change the name > > if > > > > it would make folks feel more comfortable) with each Ganesha > > stateid. > > > > And then to bring NFS v3 locks and share reservations into the > > > > picture, I've made them able to use stateids (well, state_t > > > > structures), which also cleaned up part of the SAL lock interface > > > > (since NFS v3 can hide it's "state" value in the state_t and stop > > overloading > > > state_t *state... > > > > > > 9P has a state_t per fid (which will translate into per open), so > > that should > > > work well for us as well. > > > > Yep, that's why I added state_t to 9P. There's actually a complication > > when an FSAL actually needs both share_fds and lock_fds. I will be > > looking into that some more. > > > > > > Now each FSAL gets to define exactly what an fsal_fd actually is. > > At > > > > the moment I have the open mode in the generic structure, but > > maybe it > > > > can move into the FSAL private fsal_fd. > > > > > > > > Not all FSALs will use both open and lock fds, and we could > > provide > > > > separate sizes for them so space would only be allocated when > > > > necessary (and if zero, a NULL pointer is passed). > > > > > > sounds good. > > > > > > > For most operations, the object_handle, and both a share_fd and > > > > lock_fd are passed. This allows the FSAL to decide exactly which > > ones > > > > it needs (if it needs a generic "thing" for anonymous I/O it can > > stash > > > > a generic fsal_fd in it's object_handle). > > > > > > They're passed with what we have and it's left to the FSAL function > > to open if > > > it needs something? > > > One of the problem we have at the moment in VFS is fstat() racing on > > the > > > "generic thing", which can be closed/open in other threads. > > > Do we leave it up to FSALs to protect their generic bits then ? > > (since we can't > > > know which FSAL call will use which bit, doing locking ourselves > > will have to > > > be too much for most) > > > > Note the FSAL_VFS fstat race has been resolved (by protecting the fd > > inside the FSAL - I plan to continue this migration, allow the FSAL to > > protect it's file descriptors or whatever they are). And yes, it's up > > to the FSAL when it actually needs to open something. > > > > Note that lock_fds can get left open longer than necessary, > > particularly in NFS v4.0 since there isn't a way for the client to > > indicate it's done with the lock stateid (for NFS v3 the CURRENT code > > will result in a close of the lock_fd anytime the lock owner drops to > > zero locks on a file - I think we may want to proactively allow for > > keeping the state_t around a bit longer and thus allow the FSAL the > > option of also keeping the fd (or whatever) open longer). > > > > > > The benefit of this interface is that the FSAL can leverage cache > > > > inode AVL tree and SAL hash tables without making upcalls (that > > would > > > > be fraught with locking issues) or duplicating hash tables in > > order to > > > > store information entirely separately. It also may open > > possibilities > > > > to better hinting between the layers so garbage collection can be > > > improved. > > > > > > yay. (just need to make sure we "close" anything that has been left > > open on > > > GC, so need a common level flag to say it's used maybe?) > > > > The fsal_fd that are attached to state_t have a pretty defined > > lifetime. The global fsal_fd attached to the object_handle already has > > a reasonable mechanism in place for managing it's lifetime, though we > > might want to move that management into the FSAL as mentioned above. > > > > > > In the meantime, the legacy interface is still available. If > > truly > > > > some FSALs will never need this function, but they want to > > benefit > > > > from the atomic open/create/setattr capability of FSAL open_fd > > method, > > > > we can make a more generic version of that (well, actually, it > > just > > > > needs to be ok having NULL passed for the fsal_fd). > > > > > > I think we should make FSALs so that it's always OK whatever is > > given, but > > > that's a bit more work (esp. internal locking for generic fd as > > mentioned > > > previously) > > > > The goal would actually be to eliminate the legacy API. If this > > proposed API really doesn't fit some particular FSAL or FSALs, we can > > consider the best way to converge to a reasonable API (even if for > > some things, there are two families of calls). > > > > > Looks like a good draft anyway, I'm sure we can work the details. > > > > Details have changed as I'm moved from my initial API definition to > > implementation... They will change some more... > > > > Frank > > > > > > ---------------------------------------------------------------------- > > -------- _______________________________________________ > > Nfs-ganesha-devel mailing list > > Nfs-ganesha-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > > -- > Matt Benjamin > CohortFS, LLC. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://cohortfs.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel