We may need to devote a concall to discussing this work, but I'm going to try a modest length discussion of the approach I'm taking and why.
Ultimately, this effort got kicked off due to the stupid POSIX lock behavior which really only impacts FSAL_VFS, but then when we started talking about possibly not forcing a one-to-one mapping between file descriptors and object handles, Marc Eshel suggested he could benefit from file descriptors associated with individual NFS v4 OPENs so that GPFS's I/O prediction could do better. And then Open File Descriptor Locks came along which opens FSAL_VFS up to being able to have a much improved lock interface beyond just being able to dodge the stupid POSIX lock behavior. And as I got into thinking I realized, hey, to get full share reservation and lock support in FSAL_PROXY, we need to be able to associate open and lock stateids with Ganesha stateids. So now we have: FSAL_GPFS would like to have a file descriptor per OPEN (well, really per OPEN stateid, tracking OPEN upgrade and OPEN_DOWNGRADE), and maybe one per NFS v3 client FSAL_VFS would like to have a file descriptor per lock owner per file. FSAL_PROXY would like to have a stateid per OPEN stateid and LOCK stateid FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has the same issues as FSAL_VFS. Now I don't know about other FSALs, though I wouldn't be surprised if at least some other FSALs might benefit from some mechanism to associate SOMETHING with each lock owner per file. So I came up with the idea of the FSAL providing the size of the "thing" (which I have currently named fsal_fd, but we can change the name if it would make folks feel more comfortable) with each Ganesha stateid. And then to bring NFS v3 locks and share reservations into the picture, I've made them able to use stateids (well, state_t structures), which also cleaned up part of the SAL lock interface (since NFS v3 can hide it's "state" value in the state_t and stop overloading state_t *state... Now each FSAL gets to define exactly what an fsal_fd actually is. At the moment I have the open mode in the generic structure, but maybe it can move into the FSAL private fsal_fd. Not all FSALs will use both open and lock fds, and we could provide separate sizes for them so space would only be allocated when necessary (and if zero, a NULL pointer is passed). For most operations, the object_handle, and both a share_fd and lock_fd are passed. This allows the FSAL to decide exactly which ones it needs (if it needs a generic "thing" for anonymous I/O it can stash a generic fsal_fd in it's object_handle). The benefit of this interface is that the FSAL can leverage cache inode AVL tree and SAL hash tables without making upcalls (that would be fraught with locking issues) or duplicating hash tables in order to store information entirely separately. It also may open possibilities to better hinting between the layers so garbage collection can be improved. In the meantime, the legacy interface is still available. If truly some FSALs will never need this function, but they want to benefit from the atomic open/create/setattr capability of FSAL open_fd method, we can make a more generic version of that (well, actually, it just needs to be ok having NULL passed for the fsal_fd). I'm also thinking that eventually the cache inode content_lock will no longer be what protects the generic "thing". Already FSAL_VFS is using the object_handle lock to protect access to the fd associated with the object_handle. Removing the need to hold content_lock would mean FSAL_VFS could actually manage IT's number of open file descriptors and do some management of them (maybe still in conjuction with cache inode to benefit from the LRU table). But for example, a setattr or getattr call could result in an open fd that FSAL_VFS would not immediately close. There is an eventual goal of getting rid of the insane logic SAL has to manage lock state when the FSAL supports locks but does not support lock owners. This logic is not too bad on LOCK requests, but UNLOCK requires walking the entire lock list for a file to figure out what portions of the UNLOCK request are held by other lock owners. This can result in N+1 lock requests (and memory objects) to perform unlock (where N is the number of locks held by other lock owners). Frank ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel