Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link() [ver #4]

2011-01-16 Thread David Howells
Al Viro v...@zeniv.linux.org.uk wrote:

 OK, umount_tree bug (the source of AFS leak) got presumably fixed in
 #untested.  Have fun...

Works for me.

Acked-by: David Howells dhowe...@redhat.com

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-14 Thread David Howells
Nick Piggin npig...@gmail.com wrote:

 You still have to notice that it is .d_automount in rcu-walk mode, and bail
 out if it is. I can't see where you do that.

follow_managed(), and thus follow_automount() and -d_automount(), are never
reached in rcu_walk mode, from what I can tell of the code.

There are two places follow_managed() is called:

 (1) do_lookup() - where follow_managed() is only called in the else-part of
 an if-statement contingent on a check of LOOKUP_RCU.

 (2) do_last() - where follow_managed() is subsequent to a mutex having been
 taken, so rcu-walk mode must have been exited prior to this as the
 process may have needed to sleep.

At least, I'm assuming you may not sleep whilst in rcu-walk mode.

David

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-13 Thread David Howells
Nick Piggin npig...@gmail.com wrote:

 So something has gone wrong here. You have documented .d_automount
 can be called in rcu-walk mode, but it doesn't seem to be the case.

Ah.  You removed a column and installed a new one, and I didn't notice.

Neither d_automount() and d_manage() should be entered in rcu-walk mode since
they're both expected to sleep.

Btw, should you add a fifth column for d_seq?

I should also add a column for namespace_sem as d_manage() may be called with
that held (and d_automount() must not be called with that held).

David

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-13 Thread David Howells
Nick Piggin npig...@gmail.com wrote:

  I would still prefer to see a .follow_mount API, and not tie in this
  automount specific inode detail to what could be a more flexible
  dentry-only API.
 
  The default NULL implementation would do nothing, and follow_automount
  stuff can be in fs/libfs.c to be used by filesystem, rather than
  fs/namei.c.
 
 Looking further in the patchset at the d_managed thing, that's almost what
 I'm getting at.
 
 But I don't see why any of this stuff has to happen in fs/namei.c. Just call
 the function from path walk, and provide helpers in libfs or something if
 there is a lot of common code between autofs4 and others (and leave it autofs
 specifc when that is the case).
 
 Of course, that would be the obvious and naive first approach. So really my
 question is why did that not work? And can we make it work?

You have a strange idea of what is 'obvious and naive'.  These are parts of
pathwalk, and as such should be in fs/namei.c.  I'd rather not expose
pathwalking directly to the filesystem, though I acknowledge that sometimes it
is necessary to let the filesystem influence it.

You need to consider d_automount() and d_manage() separately as they provide
two quite different hooks with different properties.

Firstly, d_automount().  The following are my points of consideration.

 (0) You're only allowed to automount on a directory.

 (1) Automounting does not need to be done when we follow .. to an automount
 point.

 (2) Automount points that are mounted upon within the current namespace can
 just be skipped over.  This is the fast path.

 (3) All the filesystem should need as a parameter to determine what it is
 allowed to mount is the inode and dentry of the automount point.  This
 holds true for all the things that currently do automounting (AFS, CIFS,
 NFS, autofs).

 (4) All the filesystem should need to do is set up a vfsmount struct and
 publish it or return an indication that there was a collision and the
 transit should be retried.

 (5) The filesystem is expected to sleep to achieve the automount, so
 spinlocks, RCU locks, preemption disablements or interrupt disablements
 may not be held across this function.

 (6) The filesystem is expected to need a write lock on namespace_sem at some
 point, so this must not be held across the call to d_automount().

 (7) The filesystem won't necessarily be calling do_add_mount() itself in
 d_automount() - in autofs's case, the construction is performed by the
 userspace daemon and then autofs4_d_automount() indicates a collision - so
 we can't move the do_add_mount() to the caller.  Additionally, the
 filesystem may want to use an expiration list.

 (8) There needs to be some limitation in place to prevent runaway
 automounting.  The ELOOP detection mechanism can be used for this.

Taking these considerations, it shows that a small amount of code can be
inserted into pathwalk and used for everything.  However, having worked with
Ian to try and get autofs4 to work with this, we came up with d_manage() to add
in a missing piece.

Note that autofs4 also uses d_automount() to build directory structures behind
the mountpoint rather than mounting on the mountpoint.  In this case, it clears
the AUTOMOUNT flag when construction is complete.

I've allowed d_automount() to return -EISDIR to follow_automount() to indicate
that no mount should be attempted here, and the directory should be given back
to pathwalk to treat as a directory.  This allows autofs's daemon access to the
directory.

Having follow_automount() update the path it has been given with the new
vfsmount and root dentry is purely an optimisation; we could instead simply
return and __follow_mount() will do lookup_mnt() again as it would if a
collision is reported.

In answer to why I haven't made __follow_mount_rcu() handle automount points, I
thought previously I saw a reason why it was unnecessary, but now I'm not so
sure.  It may be that if there are child objects of this dentry then it will
walk onto those rather than automounting - but for some reason it seems still
to automount rather than doing that.


Secondly, d_manage().  The following are the points of consideration:

 (1) A filesystem may want to hold up client processes that want to transit
 from directories in its control during pathwalk - such as when autofs is
 letting its userspace daemon tear down the stuff mounted on or created
 behind a directory.

 (2) A transit may be from a directory to a directory mounted over it, or from
 a directory to an object (file, dir, etc.) pointed to by an entry in that
 directoy.

 (3) The management of dentries in this fashion is a transient affair.

 (4) The mode in which the filesystem is normally entered for this purpose
 should be disabled as soon as possible, though it may be reenabled later
 if needed.

 (5) When the filesystem is ready it should let the held processes proceed 

[autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link() [ver #4]

2011-01-13 Thread David Howells
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The -d_automount() dentry operation:

struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The -d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.


Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds -d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following ...

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

Signed-off-by: David Howells dhowe...@redhat.com
Was-Acked-by: Ian Kent ra...@themaw.net
---

 Documentation/filesystems/Locking |2 
 Documentation/filesystems/vfs.txt |   14 +++
 fs/dcache.c   |5 +
 fs/namei.c|  205 +
 include/linux/dcache.h|7 +
 include/linux/fs.h|2 
 6 files changed, 187 insertions(+), 48 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 977d891..5f0c52a 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -19,6 +19,7 @@ prototypes:
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
+   struct vfsmount *(*d_automount)(struct path *path);
 
 locking rules:
rename_lock -d_lockmay block   rcu-walk
@@ -29,6 +30,7 @@ d_delete: no  yes no  
no
 d_release: no  no  yes no
 d_iput:no  no  yes no
 d_dname:   no  no  no  no
+d_automount:   no  no  yes no
 
 --- inode_operations --- 
 prototypes:
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index fbb324e..992cf74 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -864,6 +864,7 @@ struct dentry_operations {
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
+   struct vfsmount *(*d_automount)(struct path *);
 };
 
   d_revalidate: called when the VFS needs to revalidate a dentry. This
@@ -930,6 +931,19 @@ struct dentry_operations {
at the end of the 

Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-13 Thread Nick Piggin
On Wed, Jan 12, 2011 at 4:48 AM, David Howells dhowe...@redhat.com wrote:
 Add a dentry op (d_automount) to handle automounting directories rather than
 abusing the follow_link() inode operation.  The operation is keyed off a new
 inode flag (S_AUTOMOUNT).

 This makes it easier to add an AT_ flag to suppress terminal segment automount
 during pathwalk.  It should also remove the need for the kludge code in the
 pathwalk algorithm to handle directories with follow_link() semantics.

 A new pathwalk subroutine, follow_automount() is added to handle mountpoints.
 It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op 
 was
 supplied, -ELOOP if we've encountered too many symlinks or mountpoints, 
 -EISDIR
 if the walk point should be used without mounting and 0 if successful.  path
 will be updated if an automount took place to point to the mounted filesystem.

 I've only changed __follow_mount() to handle call follow_automount(), but it
 might be necessary to change follow_mount() too.  The latter is only called
 from follow_dotdot(), but any automounts on .. should be pinned whilst we're
 using a child of it.

 I've also extracted the mount/don't-mount logic from autofs4 and included it
 here.  It makes the mount go ahead anyway if someone calls open() or creat(),
 tries to traverse the directory, tries to chdir/chroot/etc. into the 
 directory,
 or sticks a '/' on the end of the pathname.  If they do a stat(), however,
 they'll only trigger the automount if they didn't also say O_NOFOLLOW.

 Signed-off-by: David Howells dhowe...@redhat.com
 Acked-by: Ian Kent ra...@themaw.net

I would still prefer to see a .follow_mount API, and not tie in this automount
specific inode detail to what could be a more flexible dentry-only API.

The default NULL implementation would do nothing, and follow_automount
stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c.

I would rather that the automount call just attach the mount and then go
back to namei.c where it does the mount hash lookup.

That should take like a couple of lines in the path lookup code.


  locking rules:
                rename_lock     -d_lock        may block       rcu-walk
 @@ -29,6 +30,7 @@ d_delete:     no              yes             no            
   no
  d_release:     no              no              yes             no
  d_iput:                no              no              yes             no
  d_dname:       no              no              no              no
 +d_automount:   no              no              no              yes

 +static void __follow_mount_rcu(struct nameidata *nd, struct path *path,
 +                               struct inode **inode)
 +{
 +       while (d_mountpoint(path-dentry)) {
 +               struct vfsmount *mounted;
 +               mounted = __lookup_mnt(path-mnt, path-dentry, 1);
 +               if (!mounted)
 +                       return;
 +               path-mnt = mounted;
 +               path-dentry = mounted-mnt_root;
 +               nd-seq = read_seqcount_begin(path-dentry-d_seq);
 +               *inode = path-dentry-d_inode;
 +       }
 +}

Where we have 2 functions, one with _rcu postfix, the _rcu version is used
for rcu-walk lookups, and the other one used for ref-walk lookups.

This means they have to provide the exact same functionality, or where
that isn't possible, the _rcu variant must return -ECHILD.

So something has gone wrong here. You have documented .d_automount
can be called in rcu-walk mode, but it doesn't seem to be the case. And in
the _rcu variant you skip checking automount functionality entirely. So at
least you'd have to check for d_automount_point and bail out if it exists. But
you need to be careful:


 +#define d_automount_point(dentry) \
 +       (dentry-d_inode  IS_AUTOMOUNT(dentry-d_inode))

 @@ -277,6 +278,7 @@ struct inodes_stat_t {
  #define IS_SWAPFILE(inode)     ((inode)-i_flags  S_SWAPFILE)
  #define IS_PRIVATE(inode)      ((inode)-i_flags  S_PRIVATE)
  #define IS_IMA(inode)          ((inode)-i_flags  S_IMA)
 +#define IS_AUTOMOUNT(inode)    ((inode)-i_flags  S_AUTOMOUNT)

This guy will oops in rcu-walk mode, because dentry-d_inode can go
NULL at any time.

If you do want to do this kind of thing in rcu-walk mode, the path walking
code carries the inode around for use (instead of -d_inode).

If you make sure to have dropped out from rcu-walk mode, then d_inode
can be relied on to be stable, of course.

Thanks,
Nick

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-13 Thread Nick Piggin
On Thu, Jan 13, 2011 at 2:53 PM, Nick Piggin npig...@gmail.com wrote:
 On Wed, Jan 12, 2011 at 4:48 AM, David Howells dhowe...@redhat.com wrote:
 Add a dentry op (d_automount) to handle automounting directories rather than
 abusing the follow_link() inode operation.  The operation is keyed off a new
 inode flag (S_AUTOMOUNT).

 This makes it easier to add an AT_ flag to suppress terminal segment 
 automount
 during pathwalk.  It should also remove the need for the kludge code in the
 pathwalk algorithm to handle directories with follow_link() semantics.

 A new pathwalk subroutine, follow_automount() is added to handle mountpoints.
 It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op 
 was
 supplied, -ELOOP if we've encountered too many symlinks or mountpoints, 
 -EISDIR
 if the walk point should be used without mounting and 0 if successful.  path
 will be updated if an automount took place to point to the mounted 
 filesystem.

 I've only changed __follow_mount() to handle call follow_automount(), but it
 might be necessary to change follow_mount() too.  The latter is only called
 from follow_dotdot(), but any automounts on .. should be pinned whilst 
 we're
 using a child of it.

 I've also extracted the mount/don't-mount logic from autofs4 and included it
 here.  It makes the mount go ahead anyway if someone calls open() or creat(),
 tries to traverse the directory, tries to chdir/chroot/etc. into the 
 directory,
 or sticks a '/' on the end of the pathname.  If they do a stat(), however,
 they'll only trigger the automount if they didn't also say O_NOFOLLOW.

 Signed-off-by: David Howells dhowe...@redhat.com
 Acked-by: Ian Kent ra...@themaw.net

 I would still prefer to see a .follow_mount API, and not tie in this automount
 specific inode detail to what could be a more flexible dentry-only API.

 The default NULL implementation would do nothing, and follow_automount
 stuff can be in fs/libfs.c to be used by filesystem, rather than fs/namei.c.

Looking further in the patchset at the d_managed thing, that's almost what
I'm getting at.

But I don't see why any of this stuff has to happen in fs/namei.c. Just call the
function from path walk, and provide helpers in libfs or something if there is
a lot of common code between autofs4 and others (and leave it autofs specifc
when that is the case).

Of course, that would be the obvious and naive first approach. So really my
question is why did that not work? And can we make it work?

Second observation when adding rcu-walk/ref-walk operations. What we've
done now (which is what Linus preferred and I agree with now) is to make
functions always able to accept rcu-walk mode, and have filesystems bail
out as needed.

Unless there are really fundamental reasons why rcu-walk won't work (eg.
-lookup, if it is required to do allocations and IO), or if it really
doesn't matter
(eg. a function to be called once to set up a mount, and never again).

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


Re: [autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-13 Thread Nick Piggin
On Thu, Jan 13, 2011 at 11:20 PM, David Howells dhowe...@redhat.com wrote:
 Nick Piggin npig...@gmail.com wrote:

 So something has gone wrong here. You have documented .d_automount
 can be called in rcu-walk mode, but it doesn't seem to be the case.

 Ah.  You removed a column and installed a new one, and I didn't notice.

You still have to notice that it is .d_automount in rcu-walk mode, and bail
out if it is. I can't see where you do that.


 Neither d_automount() and d_manage() should be entered in rcu-walk mode since
 they're both expected to sleep.

 Btw, should you add a fifth column for d_seq?

rcu-walk means that rcu_read_lock is held, d_seq is open, vfsmount_lock is
held for read. So it's redundant.

 I should also add a column for namespace_sem as d_manage() may be called with
 that held (and d_automount() must not be called with that held).

More documentation the merrier.

___
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs


[autofs] [PATCH 01/18] Add a dentry op to handle automounting rather than abusing follow_link()

2011-01-11 Thread David Howells
Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
inode flag (S_AUTOMOUNT).

This makes it easier to add an AT_ flag to suppress terminal segment automount
during pathwalk.  It should also remove the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

A new pathwalk subroutine, follow_automount() is added to handle mountpoints.
It will return -EREMOTE if the S_AUTOMOUNT was set, but no d_automount() op was
supplied, -ELOOP if we've encountered too many symlinks or mountpoints, -EISDIR
if the walk point should be used without mounting and 0 if successful.  path
will be updated if an automount took place to point to the mounted filesystem.

I've only changed __follow_mount() to handle call follow_automount(), but it
might be necessary to change follow_mount() too.  The latter is only called
from follow_dotdot(), but any automounts on .. should be pinned whilst we're
using a child of it.

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

Signed-off-by: David Howells dhowe...@redhat.com
Acked-by: Ian Kent ra...@themaw.net
---

 Documentation/filesystems/Locking |2 +
 Documentation/filesystems/vfs.txt |   13 
 fs/namei.c|  120 -
 include/linux/dcache.h|4 +
 include/linux/fs.h|2 +
 5 files changed, 111 insertions(+), 30 deletions(-)

diff --git a/Documentation/filesystems/Locking 
b/Documentation/filesystems/Locking
index 977d891..7ebe42d 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -19,6 +19,7 @@ prototypes:
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
+   struct vfsmount *(*d_automount)(struct path *path);
 
 locking rules:
rename_lock -d_lockmay block   rcu-walk
@@ -29,6 +30,7 @@ d_delete: no  yes no  
no
 d_release: no  no  yes no
 d_iput:no  no  yes no
 d_dname:   no  no  no  no
+d_automount:   no  no  no  yes
 
 --- inode_operations --- 
 prototypes:
diff --git a/Documentation/filesystems/vfs.txt 
b/Documentation/filesystems/vfs.txt
index fbb324e..bb8d277 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -864,6 +864,7 @@ struct dentry_operations {
void (*d_release)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
+   struct vfsmount *(*d_automount)(struct path *);
 };
 
   d_revalidate: called when the VFS needs to revalidate a dentry. This
@@ -930,6 +931,18 @@ struct dentry_operations {
at the end of the buffer, and returns a pointer to the first char.
dynamic_dname() helper function is provided to take care of this.
 
+  d_automount: called when an automount dentry is to be traversed (optional).
+   This should create a new VFS mount record, mount it on the directory
+   and return the record to the caller.  The caller is supplied with a
+   path parameter giving the automount directory to describe the automount
+   target and the parent VFS mount record to provide inheritable mount
+   parameters.  NULL should be returned if someone else managed to make
+   the automount first.  If the automount failed, then an error code
+   should be returned.
+
+   This function is only used if S_AUTOMOUNT is set on the inode to which
+   the dentry refers.
+
 Example :
 
 static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen)
diff --git a/fs/namei.c b/fs/namei.c
index 24ece10..159da29 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -877,38 +877,84 @@ int follow_up(struct path *path)
 }
 
 /*
- * serialization is taken care of in namespace.c
+ * Perform an automount
+ * - return -EISDIR to tell __follow_mount() to stop and return the path we
+ *   were called with.
  */
-static void __follow_mount_rcu(struct nameidata *nd, struct path *path,
-   struct inode **inode)
+static int follow_automount(struct path *path, unsigned flags,
+   bool *need_mntput)
 {
-   while (d_mountpoint(path-dentry)) {
-   struct