Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link() [ver #4]
Al Viro v...@zeniv.linux.org.uk wrote: OK, umount_tree bug (the source of AFS leak) got presumably fixed in #untested. Have fun... Works for me. Acked-by: David Howells dhowe...@redhat.com ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
Nick Piggin npig...@gmail.com wrote: You still have to notice that it is .d_automount in rcu-walk mode, and bail out if it is. I can't see where you do that. follow_managed(), and thus follow_automount() and -d_automount(), are never reached in rcu_walk mode, from what I can tell of the code. There are two places follow_managed() is called: (1) do_lookup() - where follow_managed() is only called in the else-part of an if-statement contingent on a check of LOOKUP_RCU. (2) do_last() - where follow_managed() is subsequent to a mutex having been taken, so rcu-walk mode must have been exited prior to this as the process may have needed to sleep. At least, I'm assuming you may not sleep whilst in rcu-walk mode. David ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
Nick Piggin npig...@gmail.com wrote: So something has gone wrong here. You have documented .d_automount can be called in rcu-walk mode, but it doesn't seem to be the case. Ah. You removed a column and installed a new one, and I didn't notice. Neither d_automount() and d_manage() should be entered in rcu-walk mode since they're both expected to sleep. Btw, should you add a fifth column for d_seq? I should also add a column for namespace_sem as d_manage() may be called with that held (and d_automount() must not be called with that held). David ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
Nick Piggin npig...@gmail.com wrote: I would still prefer to see a .follow_mount API, and not tie in this automount specific inode detail to what could be a more flexible dentry-only API. The default NULL implementation would do nothing, and follow_automount stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c. Looking further in the patchset at the d_managed thing, that's almost what I'm getting at. But I don't see why any of this stuff has to happen in fs/namei.c. Just call the function from path walk, and provide helpers in libfs or something if there is a lot of common code between autofs4 and others (and leave it autofs specifc when that is the case). Of course, that would be the obvious and naive first approach. So really my question is why did that not work? And can we make it work? You have a strange idea of what is 'obvious and naive'. These are parts of pathwalk, and as such should be in fs/namei.c. I'd rather not expose pathwalking directly to the filesystem, though I acknowledge that sometimes it is necessary to let the filesystem influence it. You need to consider d_automount() and d_manage() separately as they provide two quite different hooks with different properties. Firstly, d_automount(). The following are my points of consideration. (0) You're only allowed to automount on a directory. (1) Automounting does not need to be done when we follow .. to an automount point. (2) Automount points that are mounted upon within the current namespace can just be skipped over. This is the fast path. (3) All the filesystem should need as a parameter to determine what it is allowed to mount is the inode and dentry of the automount point. This holds true for all the things that currently do automounting (AFS, CIFS, NFS, autofs). (4) All the filesystem should need to do is set up a vfsmount struct and publish it or return an indication that there was a collision and the transit should be retried. (5) The filesystem is expected to sleep to achieve the automount, so spinlocks, RCU locks, preemption disablements or interrupt disablements may not be held across this function. (6) The filesystem is expected to need a write lock on namespace_sem at some point, so this must not be held across the call to d_automount(). (7) The filesystem won't necessarily be calling do_add_mount() itself in d_automount() - in autofs's case, the construction is performed by the userspace daemon and then autofs4_d_automount() indicates a collision - so we can't move the do_add_mount() to the caller. Additionally, the filesystem may want to use an expiration list. (8) There needs to be some limitation in place to prevent runaway automounting. The ELOOP detection mechanism can be used for this. Taking these considerations, it shows that a small amount of code can be inserted into pathwalk and used for everything. However, having worked with Ian to try and get autofs4 to work with this, we came up with d_manage() to add in a missing piece. Note that autofs4 also uses d_automount() to build directory structures behind the mountpoint rather than mounting on the mountpoint. In this case, it clears the AUTOMOUNT flag when construction is complete. I've allowed d_automount() to return -EISDIR to follow_automount() to indicate that no mount should be attempted here, and the directory should be given back to pathwalk to treat as a directory. This allows autofs's daemon access to the directory. Having follow_automount() update the path it has been given with the new vfsmount and root dentry is purely an optimisation; we could instead simply return and __follow_mount() will do lookup_mnt() again as it would if a collision is reported. In answer to why I haven't made __follow_mount_rcu() handle automount points, I thought previously I saw a reason why it was unnecessary, but now I'm not so sure. It may be that if there are child objects of this dentry then it will walk onto those rather than automounting - but for some reason it seems still to automount rather than doing that. Secondly, d_manage(). The following are the points of consideration: (1) A filesystem may want to hold up client processes that want to transit from directories in its control during pathwalk - such as when autofs is letting its userspace daemon tear down the stuff mounted on or created behind a directory. (2) A transit may be from a directory to a directory mounted over it, or from a directory to an object (file, dir, etc.) pointed to by an entry in that directoy. (3) The management of dentries in this fashion is a transient affair. (4) The mode in which the filesystem is normally entered for this purpose should be disabled as soon as possible, though it may be reenabled later if needed. (5) When the filesystem is ready it should let the held processes proceed
[autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link() [ver #4]
Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new dentry flag (DCACHE_NEED_AUTOMOUNT). This also makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk and removes the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. The -d_automount() dentry operation: struct vfsmount *(*d_automount)(struct path *mountpoint); takes a pointer to the directory to be mounted upon, which is expected to provide sufficient data to determine what should be mounted. If successful, it should return the vfsmount struct it creates (which it should also have added to the namespace using do_add_mount() or similar). If there's a collision with another automount attempt, NULL should be returned. If the directory specified by the parameter should be used directly rather than being mounted upon, -EISDIR should be returned. In any other case, an error code should be returned. The -d_automount() operation is called with no locks held and may sleep. At this point the pathwalk algorithm will be in ref-walk mode. Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is added to handle mountpoints. It will return -EREMOTE if the automount flag was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. The path will be updated to point to the mounted filesystem if a successful automount took place. __follow_mount() is replaced by follow_managed() which is more generic (especially with the patch that adds -d_manage()). This handles transits from directories during pathwalk, including automounting and skipping over mountpoints (and holding processes with the next patch). __follow_mount_rcu() will jump out of RCU-walk mode if it encounters an automount point with nothing mounted on it. follow_dotdot*() does not handle automounts as you don't want to trigger them whilst following ... I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their inodes as automount points. This flag is automatically propagated to the dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate(). This saves NFS and could save AFS a private flag bit apiece, but is not strictly necessary. It would be preferable to do the propagation in d_set_d_op(), but that doesn't normally have access to the inode. Signed-off-by: David Howells dhowe...@redhat.com Was-Acked-by: Ian Kent ra...@themaw.net --- Documentation/filesystems/Locking |2 Documentation/filesystems/vfs.txt | 14 +++ fs/dcache.c |5 + fs/namei.c| 205 + include/linux/dcache.h|7 + include/linux/fs.h|2 6 files changed, 187 insertions(+), 48 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 977d891..5f0c52a 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -19,6 +19,7 @@ prototypes: void (*d_release)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); + struct vfsmount *(*d_automount)(struct path *path); locking rules: rename_lock -d_lockmay block rcu-walk @@ -29,6 +30,7 @@ d_delete: no yes no no d_release: no no yes no d_iput:no no yes no d_dname: no no no no +d_automount: no no yes no --- inode_operations --- prototypes: diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index fbb324e..992cf74 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -864,6 +864,7 @@ struct dentry_operations { void (*d_release)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)(struct dentry *, char *, int); + struct vfsmount *(*d_automount)(struct path *); }; d_revalidate: called when the VFS needs to revalidate a dentry. This @@ -930,6 +931,19 @@ struct dentry_operations { at the end of the
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
On Wed, Jan 12, 2011 at 4:48 AM, David Howells dhowe...@redhat.com wrote: Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new inode flag (S_AUTOMOUNT). This makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk. It should also remove the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. A new pathwalk subroutine, follow_automount() is added to handle mountpoints. It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. path will be updated if an automount took place to point to the mounted filesystem. I've only changed __follow_mount() to handle call follow_automount(), but it might be necessary to change follow_mount() too. The latter is only called from follow_dotdot(), but any automounts on .. should be pinned whilst we're using a child of it. I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. Signed-off-by: David Howells dhowe...@redhat.com Acked-by: Ian Kent ra...@themaw.net I would still prefer to see a .follow_mount API, and not tie in this automount specific inode detail to what could be a more flexible dentry-only API. The default NULL implementation would do nothing, and follow_automount stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c. I would rather that the automount call just attach the mount and then go back to namei.c where it does the mount hash lookup. That should take like a couple of lines in the path lookup code. locking rules: rename_lock -d_lock may block rcu-walk @@ -29,6 +30,7 @@ d_delete: no yes no no d_release: no no yes no d_iput: no no yes no d_dname: no no no no +d_automount: no no no yes +static void __follow_mount_rcu(struct nameidata *nd, struct path *path, + struct inode **inode) +{ + while (d_mountpoint(path-dentry)) { + struct vfsmount *mounted; + mounted = __lookup_mnt(path-mnt, path-dentry, 1); + if (!mounted) + return; + path-mnt = mounted; + path-dentry = mounted-mnt_root; + nd-seq = read_seqcount_begin(path-dentry-d_seq); + *inode = path-dentry-d_inode; + } +} Where we have 2 functions, one with _rcu postfix, the _rcu version is used for rcu-walk lookups, and the other one used for ref-walk lookups. This means they have to provide the exact same functionality, or where that isn't possible, the _rcu variant must return -ECHILD. So something has gone wrong here. You have documented .d_automount can be called in rcu-walk mode, but it doesn't seem to be the case. And in the _rcu variant you skip checking automount functionality entirely. So at least you'd have to check for d_automount_point and bail out if it exists. But you need to be careful: +#define d_automount_point(dentry) \ + (dentry-d_inode IS_AUTOMOUNT(dentry-d_inode)) @@ -277,6 +278,7 @@ struct inodes_stat_t { #define IS_SWAPFILE(inode) ((inode)-i_flags S_SWAPFILE) #define IS_PRIVATE(inode) ((inode)-i_flags S_PRIVATE) #define IS_IMA(inode) ((inode)-i_flags S_IMA) +#define IS_AUTOMOUNT(inode) ((inode)-i_flags S_AUTOMOUNT) This guy will oops in rcu-walk mode, because dentry-d_inode can go NULL at any time. If you do want to do this kind of thing in rcu-walk mode, the path walking code carries the inode around for use (instead of -d_inode). If you make sure to have dropped out from rcu-walk mode, then d_inode can be relied on to be stable, of course. Thanks, Nick ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
On Thu, Jan 13, 2011 at 2:53 PM, Nick Piggin npig...@gmail.com wrote: On Wed, Jan 12, 2011 at 4:48 AM, David Howells dhowe...@redhat.com wrote: Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new inode flag (S_AUTOMOUNT). This makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk. It should also remove the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. A new pathwalk subroutine, follow_automount() is added to handle mountpoints. It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. path will be updated if an automount took place to point to the mounted filesystem. I've only changed __follow_mount() to handle call follow_automount(), but it might be necessary to change follow_mount() too. The latter is only called from follow_dotdot(), but any automounts on .. should be pinned whilst we're using a child of it. I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. Signed-off-by: David Howells dhowe...@redhat.com Acked-by: Ian Kent ra...@themaw.net I would still prefer to see a .follow_mount API, and not tie in this automount specific inode detail to what could be a more flexible dentry-only API. The default NULL implementation would do nothing, and follow_automount stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c. Looking further in the patchset at the d_managed thing, that's almost what I'm getting at. But I don't see why any of this stuff has to happen in fs/namei.c. Just call the function from path walk, and provide helpers in libfs or something if there is a lot of common code between autofs4 and others (and leave it autofs specifc when that is the case). Of course, that would be the obvious and naive first approach. So really my question is why did that not work? And can we make it work? Second observation when adding rcu-walk/ref-walk operations. What we've done now (which is what Linus preferred and I agree with now) is to make functions always able to accept rcu-walk mode, and have filesystems bail out as needed. Unless there are really fundamental reasons why rcu-walk won't work (eg. -lookup, if it is required to do allocations and IO), or if it really doesn't matter (eg. a function to be called once to set up a mount, and never again). ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
On Thu, Jan 13, 2011 at 11:20 PM, David Howells dhowe...@redhat.com wrote: Nick Piggin npig...@gmail.com wrote: So something has gone wrong here. You have documented .d_automount can be called in rcu-walk mode, but it doesn't seem to be the case. Ah. You removed a column and installed a new one, and I didn't notice. You still have to notice that it is .d_automount in rcu-walk mode, and bail out if it is. I can't see where you do that. Neither d_automount() and d_manage() should be entered in rcu-walk mode since they're both expected to sleep. Btw, should you add a fifth column for d_seq? rcu-walk means that rcu_read_lock is held, d_seq is open, vfsmount_lock is held for read. So it's redundant. I should also add a column for namespace_sem as d_manage() may be called with that held (and d_automount() must not be called with that held). More documentation the merrier. ___ autofs mailing list autofs@linux.kernel.org http://linux.kernel.org/mailman/listinfo/autofs
[autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()
Add a dentry op (d_automount) to handle automounting directories rather than abusing the follow_link() inode operation. The operation is keyed off a new inode flag (S_AUTOMOUNT). This makes it easier to add an AT_ flag to suppress terminal segment automount during pathwalk. It should also remove the need for the kludge code in the pathwalk algorithm to handle directories with follow_link() semantics. A new pathwalk subroutine, follow_automount() is added to handle mountpoints. It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op was supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR if the walk point should be used without mounting and 0 if successful. path will be updated if an automount took place to point to the mounted filesystem. I've only changed __follow_mount() to handle call follow_automount(), but it might be necessary to change follow_mount() too. The latter is only called from follow_dotdot(), but any automounts on .. should be pinned whilst we're using a child of it. I've also extracted the mount/don't-mount logic from autofs4 and included it here. It makes the mount go ahead anyway if someone calls open() or creat(), tries to traverse the directory, tries to chdir/chroot/etc. into the directory, or sticks a '/' on the end of the pathname. If they do a stat(), however, they'll only trigger the automount if they didn't also say O_NOFOLLOW. Signed-off-by: David Howells dhowe...@redhat.com Acked-by: Ian Kent ra...@themaw.net --- Documentation/filesystems/Locking |2 + Documentation/filesystems/vfs.txt | 13 fs/namei.c| 120 - include/linux/dcache.h|4 + include/linux/fs.h|2 + 5 files changed, 111 insertions(+), 30 deletions(-) diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 977d891..7ebe42d 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -19,6 +19,7 @@ prototypes: void (*d_release)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); + struct vfsmount *(*d_automount)(struct path *path); locking rules: rename_lock -d_lockmay block rcu-walk @@ -29,6 +30,7 @@ d_delete: no yes no no d_release: no no yes no d_iput:no no yes no d_dname: no no no no +d_automount: no no no yes --- inode_operations --- prototypes: diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index fbb324e..bb8d277 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -864,6 +864,7 @@ struct dentry_operations { void (*d_release)(struct dentry *); void (*d_iput)(struct dentry *, struct inode *); char *(*d_dname)(struct dentry *, char *, int); + struct vfsmount *(*d_automount)(struct path *); }; d_revalidate: called when the VFS needs to revalidate a dentry. This @@ -930,6 +931,18 @@ struct dentry_operations { at the end of the buffer, and returns a pointer to the first char. dynamic_dname() helper function is provided to take care of this. + d_automount: called when an automount dentry is to be traversed (optional). + This should create a new VFS mount record, mount it on the directory + and return the record to the caller. The caller is supplied with a + path parameter giving the automount directory to describe the automount + target and the parent VFS mount record to provide inheritable mount + parameters. NULL should be returned if someone else managed to make + the automount first. If the automount failed, then an error code + should be returned. + + This function is only used if S_AUTOMOUNT is set on the inode to which + the dentry refers. + Example : static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen) diff --git a/fs/namei.c b/fs/namei.c index 24ece10..159da29 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -877,38 +877,84 @@ int follow_up(struct path *path) } /* - * serialization is taken care of in namespace.c + * Perform an automount + * - return -EISDIR to tell __follow_mount() to stop and return the path we + * were called with. */ -static void __follow_mount_rcu(struct nameidata *nd, struct path *path, - struct inode **inode) +static int follow_automount(struct path *path, unsigned flags, + bool *need_mntput) { - while (d_mountpoint(path-dentry)) { - struct