Hi, ----- "Frank Filz" <ffilz...@mindspring.com> wrote:
> > The problem is that "fd"s are an internal detail that we do not want > leaking > > out into the general server. > > > > There are several reasons for this adduced in my and Adam's comments > in > > gerrit. > > > > I may be missing something, but what is the generic (non-VFS or few > other > > FSAL-specific) concept that is being represented here? Is it an > OPEN > > complex? > > From what I can tell, this is useful to more than just a few FSALs. > Now maybe "file descriptor" is not the best choice (since FSAL_PROXY > is going to use the "thing" to stash information about stateids from > the upstream server (which will be different though perhaps related to > Ganesha's stateids)). I think well-defining the abstraction is key. Adam didn't intuit that the issue was just one of naming, although I agree, I the name "fd" may be distracting. But I am more concerned about interfaces. I share Dan's (and perhaps Adam's) intuition that we'd like to unify around a smaller number of objects, and in making what the general server is doing more transparent to FSAL, when that can make for a cleaner interface (we're not trying to hide state from FSAL, but we might be interested in avoiding the need to allocate more free-standing blobs. So I liked Dan's suggestion to pass states to FSAL calls. Could we talk more about that idea? > > It's context to associate with share reservations (NFS 4 OPEN, NLM > SHARE) and lock state. Different FSALs will have different context, > but perhaps not all FSALs will need the context. > > But having a "few" FSALs need an API also isn't a good reason to turn > down that API. > > I am working to make this "thing" as generic as makes sense. If > there's some FSAL that has needs that I haven't considered, then > please share those needs rather than shooting down ideas. I'm very happy to share needs, but I care about implementation decisions, too. Matt > > Frank > > > > > Hi Frank, list. > > > > > > > > Frank Filz wrote on Mon, Jul 27, 2015 at 09:33:54AM -0700: > > > > > FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it > has > > > the > > > > > same issues as FSAL_VFS. > > > > > > > > LUSTRE would actually like one fd per open or equivalent if we > can > > > reap them > > > > efficiently, for the same reason as GPFS -- sharing a fd kills > > > readahead > > > > behaviour and lustre needs the client to do readahead and has > us > > > disable > > > > readahead disk-side, so ganesha performs quite worse with > multiple > > > clients. > > > > > > > > Well, I guess that's close enough to what we're doing anyway. > > > > > > Fds that are associated with NFS 4 OPEN or NFS 3 SHARE will be > closed > > > appropriately. If the FSAL didn't need to keep them open, we > could > > > enhance the cache inode LRU to be able to consider pinned entries > for > > > advisory file close. The FSAL could also try to do something to > keep > > > track and I dunno, LRU garbage collection of idle file > descriptors > > > might actually belong in the FSAL anyway instead of cache inode. > > > > > > > > Now I don't know about other FSALs, though I wouldn't be > surprised > > > if > > > > > at least some other FSALs might benefit from some mechanism > to > > > > > associate SOMETHING with each lock owner per file. > > > > > > > > > > So I came up with the idea of the FSAL providing the size of > the > > > "thing" > > > > > (which I have currently named fsal_fd, but we can change the > name > > > if > > > > > it would make folks feel more comfortable) with each Ganesha > > > stateid. > > > > > And then to bring NFS v3 locks and share reservations into > the > > > > > picture, I've made them able to use stateids (well, state_t > > > > > structures), which also cleaned up part of the SAL lock > interface > > > > > (since NFS v3 can hide it's "state" value in the state_t and > stop > > > overloading > > > > state_t *state... > > > > > > > > 9P has a state_t per fid (which will translate into per open), > so > > > that should > > > > work well for us as well. > > > > > > Yep, that's why I added state_t to 9P. There's actually a > complication > > > when an FSAL actually needs both share_fds and lock_fds. I will > be > > > looking into that some more. > > > > > > > > Now each FSAL gets to define exactly what an fsal_fd actually > is. > > > At > > > > > the moment I have the open mode in the generic structure, but > > > maybe it > > > > > can move into the FSAL private fsal_fd. > > > > > > > > > > Not all FSALs will use both open and lock fds, and we could > > > provide > > > > > separate sizes for them so space would only be allocated when > > > > > necessary (and if zero, a NULL pointer is passed). > > > > > > > > sounds good. > > > > > > > > > For most operations, the object_handle, and both a share_fd > and > > > > > lock_fd are passed. This allows the FSAL to decide exactly > which > > > ones > > > > > it needs (if it needs a generic "thing" for anonymous I/O it > can > > > stash > > > > > a generic fsal_fd in it's object_handle). > > > > > > > > They're passed with what we have and it's left to the FSAL > function > > > to open if > > > > it needs something? > > > > One of the problem we have at the moment in VFS is fstat() > racing on > > > the > > > > "generic thing", which can be closed/open in other threads. > > > > Do we leave it up to FSALs to protect their generic bits then ? > > > (since we can't > > > > know which FSAL call will use which bit, doing locking > ourselves > > > will have to > > > > be too much for most) > > > > > > Note the FSAL_VFS fstat race has been resolved (by protecting the > fd > > > inside the FSAL - I plan to continue this migration, allow the > FSAL to > > > protect it's file descriptors or whatever they are). And yes, it's > up > > > to the FSAL when it actually needs to open something. > > > > > > Note that lock_fds can get left open longer than necessary, > > > particularly in NFS v4.0 since there isn't a way for the client > to > > > indicate it's done with the lock stateid (for NFS v3 the CURRENT > code > > > will result in a close of the lock_fd anytime the lock owner drops > to > > > zero locks on a file - I think we may want to proactively allow > for > > > keeping the state_t around a bit longer and thus allow the FSAL > the > > > option of also keeping the fd (or whatever) open longer). > > > > > > > > The benefit of this interface is that the FSAL can leverage > cache > > > > > inode AVL tree and SAL hash tables without making upcalls > (that > > > would > > > > > be fraught with locking issues) or duplicating hash tables in > > > order to > > > > > store information entirely separately. It also may open > > > possibilities > > > > > to better hinting between the layers so garbage collection can > be > > > > improved. > > > > > > > > yay. (just need to make sure we "close" anything that has been > left > > > open on > > > > GC, so need a common level flag to say it's used maybe?) > > > > > > The fsal_fd that are attached to state_t have a pretty defined > > > lifetime. The global fsal_fd attached to the object_handle already > has > > > a reasonable mechanism in place for managing it's lifetime, though > we > > > might want to move that management into the FSAL as mentioned > above. > > > > > > > > In the meantime, the legacy interface is still available. If > > > truly > > > > > some FSALs will never need this function, but they want to > > > benefit > > > > > from the atomic open/create/setattr capability of FSAL > open_fd > > > method, > > > > > we can make a more generic version of that (well, actually, > it > > > just > > > > > needs to be ok having NULL passed for the fsal_fd). > > > > > > > > I think we should make FSALs so that it's always OK whatever is > > > given, but > > > > that's a bit more work (esp. internal locking for generic fd as > > > mentioned > > > > previously) > > > > > > The goal would actually be to eliminate the legacy API. If this > > > proposed API really doesn't fit some particular FSAL or FSALs, we > can > > > consider the best way to converge to a reasonable API (even if > for > > > some things, there are two families of calls). > > > > > > > Looks like a good draft anyway, I'm sure we can work the > details. > > > > > > Details have changed as I'm moved from my initial API definition > to > > > implementation... They will change some more... > > > > > > Frank > > > > > > > > > > ---------------------------------------------------------------------- > > > -------- _______________________________________________ > > > Nfs-ganesha-devel mailing list > > > Nfs-ganesha-devel@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > > > > -- > > Matt Benjamin > > CohortFS, LLC. > > 315 West Huron Street, Suite 140A > > Ann Arbor, Michigan 48103 > > > > http://cohortfs.com > > > > tel. 734-761-4689 > > fax. 734-769-8938 > > cel. 734-216-5309 -- Matt Benjamin CohortFS, LLC. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://cohortfs.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel