[Nfs-ganesha-devel] Multiple-fd work

Frank Filz Mon, 27 Jul 2015 09:35:44 -0700

We may need to devote a concall to discussing this work, but I'm going to
try a modest length discussion of the approach I'm taking and why.


Ultimately, this effort got kicked off due to the stupid POSIX lock behavior
which really only impacts FSAL_VFS, but then when we started talking about
possibly not forcing a one-to-one mapping between file descriptors and
object handles, Marc Eshel suggested he could benefit from file descriptors
associated with individual NFS v4 OPENs so that GPFS's I/O prediction could
do better. And then Open File Descriptor Locks came along which opens
FSAL_VFS up to being able to have a much improved lock interface beyond just
being able to dodge the stupid POSIX lock behavior.

And as I got into thinking I realized, hey, to get full share reservation
and lock support in FSAL_PROXY, we need to be able to associate open and
lock stateids with Ganesha stateids.

So now we have:

FSAL_GPFS would like to have a file descriptor per OPEN (well, really per
OPEN stateid, tracking OPEN upgrade and OPEN_DOWNGRADE), and maybe one per
NFS v3 client

FSAL_VFS would like to have a file descriptor per lock owner per file.

FSAL_PROXY would like to have a stateid per OPEN stateid and LOCK stateid

FSAL_LUSTRE actually also uses fcntl to get POSIX locks, so it has the same
issues as FSAL_VFS.

Now I don't know about other FSALs, though I wouldn't be surprised if at
least some other FSALs might benefit from some mechanism to associate
SOMETHING with each lock owner per file.

So I came up with the idea of the FSAL providing the size of the "thing"
(which I have currently named fsal_fd, but we can change the name if it
would make folks feel more comfortable) with each Ganesha stateid. And then
to bring NFS v3 locks and share reservations into the picture, I've made
them able to use stateids (well, state_t structures), which also cleaned up
part of the SAL lock interface (since NFS v3 can hide it's "state" value in
the state_t and stop overloading state_t *state...

Now each FSAL gets to define exactly what an fsal_fd actually is. At the
moment I have the open mode in the generic structure, but maybe it can move
into the FSAL private fsal_fd.

Not all FSALs will use both open and lock fds, and we could provide separate
sizes for them so space would only be allocated when necessary (and if zero,
a NULL pointer is passed).

For most operations, the object_handle, and both a share_fd and lock_fd are
passed. This allows the FSAL to decide exactly which ones it needs (if it
needs a generic "thing" for anonymous I/O it can stash a generic fsal_fd in
it's object_handle).

The benefit of this interface is that the FSAL can leverage cache inode AVL
tree and SAL hash tables without making upcalls (that would be fraught with
locking issues) or duplicating hash tables in order to store information
entirely separately. It also may open possibilities to better hinting
between the layers so garbage collection can be improved.

In the meantime, the legacy interface is still available. If truly some
FSALs will never need this function, but they want to benefit from the
atomic open/create/setattr capability of FSAL open_fd method, we can make a
more generic version of that (well, actually, it just needs to be ok having
NULL passed for the fsal_fd).

I'm also thinking that eventually the cache inode content_lock will no
longer be what protects the generic "thing". Already FSAL_VFS is using the
object_handle lock to protect access to the fd associated with the
object_handle. Removing the need to hold content_lock would mean FSAL_VFS
could actually manage IT's number of open file descriptors and do some
management of them (maybe still in conjuction with cache inode to benefit
from the LRU table). But for example, a setattr or getattr call could result
in an open fd that FSAL_VFS would not immediately close.

There is an eventual goal of getting rid of the insane logic SAL has to
manage lock state when the FSAL supports locks but does not support lock
owners. This logic is not too bad on LOCK requests, but UNLOCK requires
walking the entire lock list for a file to figure out what portions of the
UNLOCK request are held by other lock owners. This can result in N+1 lock
requests (and memory objects) to perform unlock (where N is the number of
locks held by other lock owners).

Frank


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

[Nfs-ganesha-devel] Multiple-fd work

Reply via email to