Re: 2.4.2: opening deleted directories

2001-03-18 Thread Chuck Lever

repeating neil's question from below
  "why is the open(".") behavior an issue?"

there are a few reasons:

* NFS close-to-open cache consistency:

in NFS, close-to-open semantics require that attributes be
fetched from the server when a file/directory is opened.  this
behavior is in part to help an application determine whether
the file or directory still exists or has been replaced or
removed, as in my 'ls' example.

in the open(".") case, path_walk doesn't invoke either
d_lookup or d_revalidate, so there is no opportunity in the
present logic to retrieve the directory's attributes.  this
potentially breaks programs that depend on attributes
being correct when opening ".".  'make' and 'ls' are just two
examples.

this is, btw, the original problem Trond and I were discussing.
he pointed out this problem.

* rmdir behavior:

the POSIX 1003.1 definition of rmdir() states that:

 If the directory is the root directory or the current working directory
 of any process, the effect of this function is implementation-defined.

cop-out. it later states that:

 If one or more processes have the directory open when the last
 link is removed, the dot and dot-dot entries, if present, are removed
 before rmdir() returns and no new entries may be created in the
 directory.

this indicates to me that, while the directory may continue to exist
if it's the cwd of some other process, the "." and ".." entries must
be removed, or equivalently, that lookups of "." and ".." will always
fail after a directory is deleted.

* standard pathname resolution behavior:

according to POSIX 1003.1, resolving a relative pathname
means the resolution *begins* at the current working directory.
In our case, if we follow POSIX resolution strategy, after starting
at the cwd, a lookup of "." should be done.  At this point I infer
that since the directory has been deleted, "." doesn't exist, and
open() returns ENOENT.

IOW, according to the text in the standard, the current
working directory is not an open file descriptor, it is simply a
naming convenience used during pathname resolution.

* other broken system calls:

i haven't tried this, but i'd guess stat(".") would behave similarly.
thus stat(".") on such a removed directory would tell an application
that the directory exists when in fact it doesn't.  this borders on
insecure behavior.  fortunately, no other operations are allowed
on the directory.

* consistent behavior across operating systems:

there are other flavors of UNIX that don't appear to work
this way.  once the directory is removed, it cannot be opened,
both on Solaris and OpenBSD.  i don't have access to others
at the moment.  this complicates porting applications among
operating systems, if only slightly.

* open() is a name space operation:

open() converts a pathname into a file descriptor; it's a name
space operation.  i believe that if a file or directory no longer
exists, applications expect they will not be able to open the file
or directory because it is no longer attached to the file system's
name space.  i don't believe there are any other cases in Linux
where you can open a file or directory that has been removed,
are there?

* good design:

i believe in reporting an error as soon as it occurs.  there are no
other operations allowed on a deleted directory, but the open()
call is the first opportunity for the operating system to indicate
to an application that the directory is gone.



i'm not trying to start a rock fight.  but i think this behavior is a
little strange when compared to other systems, and especially the
NFS part is bothersome.  and yes, i know that ext2 doesn't
support lookups on ".".  but other file systems do... and Linux
is operating in a larger universe these days.

and NB: according to the POSIX standard's description of
relative pathname lookup and rmdir, i'd say that, if the cwd is
a deleted directory, open("..") should also fail .  i haven't
checked whether this is true or not.  but we do know that
".." is handled similarly in path_walk -- no d_lookup or
d_revalidate is done.


- Original Message -
From: "Neil Brown" [EMAIL PROTECTED]
To: "Chuck Lever" [EMAIL PROTECTED]
Cc: "Linux FS Developers" [EMAIL PROTECTED]; "Trond Myklebust"
[EMAIL PROTECTED]
Sent: Sunday, March 18, 2001 5:40 PM
Subject: Re: 2.4.2: opening deleted directories


 On Sunday March 18, [EMAIL PROTECTED] wrote:
  on Linux, if my cwd is a deleted directory, i can still open it.  to
wit:
 
  notice the open(".") -- it opens the current working directory that
  is in effect for the "ls" command.  but i just deleted that directory
  from another shell.  shouldn't that open(".") return ENOENT?

 Note that the error message you expect is  is "ENOENT" == Error, NO
 ENTry.  The ENTRY th

[PATCH 0/2] RFC: exporting per-superblock statistics to user space

2005-03-17 Thread Chuck Lever
 We still have a need to provide iostat like statistics for NFS
 clients.  Following are a couple of patches, against 2.6.11.3, which
 prototype an approach for providing this kind of data to user programs.

 I'd like some comment on the approach.

 01-mountstats.patch adds a new file called /proc/self/mountstats and a
 new file system method called show_stats.  this just replicates
 /proc/mounts and the show_options hook.

 02-nfs-iostat.patch teachs the NFS client to use the new show_stats
 hook as a demonstration.

 Note that this approach addresses previously voiced concerns about
 exporting per-superblock stats to user space.

 1. Processes can't see stats for file systems mounted outside their
namespace.

 2. Reading the stats file is serialized with mount and unmount
operations.

 3. The approach doesn't use /sys or kobjects.

 4. There are no lifetime issues tied to file systems loaded as a
module.

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] VFS: New /proc file /proc/self/mountstats

2005-03-17 Thread Chuck Lever
 Create a new file under /proc/self, called mountstats, where mounted file
 systems can export information (configuration options, performance counters,
 and so on).  Use a mechanism similar to /proc/mounts and s_ops-show_options.

 This mechanism does not violate namespace security, and is safe to use while
 other processes are unmounting file systems.

 Version: Mon, 14 Mar 2005 17:06:04 -0500
 
 Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---
 
 fs/namespace.c |   66 +
 fs/proc/base.c |   40 +++
 include/linux/fs.h |1 
 3 files changed, 107 insertions(+)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/namespace.c 
01-mountstats/fs/namespace.c
--- 00-stock/fs/namespace.c 2005-03-02 02:38:13.0 -0500
+++ 01-mountstats/fs/namespace.c2005-03-14 15:24:51.565085000 -0500
@@ -265,6 +265,72 @@ struct seq_operations mounts_op = {
.show   = show_vfsmnt
 };
 
+/* iterator */
+static void *ms_start(struct seq_file *m, loff_t *pos)
+{
+   struct namespace *n = m-private;
+   struct list_head *p;
+   loff_t l = *pos;
+
+   down_read(n-sem);
+   list_for_each(p, n-list)
+   if (!l--)
+   return list_entry(p, struct vfsmount, mnt_list);
+   return NULL;
+}
+
+static void *ms_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   struct namespace *n = m-private;
+   struct list_head *p = ((struct vfsmount *)v)-mnt_list.next;
+   (*pos)++;
+   return p==n-list ? NULL : list_entry(p, struct vfsmount, mnt_list);
+}
+
+static void ms_stop(struct seq_file *m, void *v)
+{
+   struct namespace *n = m-private;
+   up_read(n-sem);
+}
+
+static int show_vfsstat(struct seq_file *m, void *v)
+{
+   struct vfsmount *mnt = v;
+   int err = 0;
+
+   /* device */
+   if (mnt-mnt_devname) {
+   seq_puts(m, device );
+   mangle(m, mnt-mnt_devname);
+   } else
+   seq_puts(m, no device);
+
+   /* mount point */
+   seq_puts(m,  mounted on );
+   seq_path(m, mnt, mnt-mnt_root,  \t\n\\);
+   seq_putc(m, ' ');
+
+   /* file system type */
+   seq_puts(m, with fstype );
+   mangle(m, mnt-mnt_sb-s_type-name);
+
+   /* optional statistics */
+   if (mnt-mnt_sb-s_op-show_stats) {
+   seq_putc(m, ' ');
+   err = mnt-mnt_sb-s_op-show_stats(m, mnt);
+   }
+
+   seq_putc(m, '\n');
+   return err;
+}
+
+struct seq_operations mountstats_op = {
+   .start  = ms_start,
+   .next   = ms_next,
+   .stop   = ms_stop,
+   .show   = show_vfsstat,
+};
+
 /**
  * may_umount_tree - check if a mount tree is busy
  * @mnt: root of mount tree
diff -X /home/cel/src/linux/dont-diff -Naurp 00-stock/fs/proc/base.c 
01-mountstats/fs/proc/base.c
--- 00-stock/fs/proc/base.c 2005-03-02 02:38:12.0 -0500
+++ 01-mountstats/fs/proc/base.c2005-03-14 15:24:51.571085000 -0500
@@ -60,6 +60,7 @@ enum pid_directory_inos {
PROC_TGID_STATM,
PROC_TGID_MAPS,
PROC_TGID_MOUNTS,
+   PROC_TGID_MOUNTSTATS,
PROC_TGID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
PROC_TGID_SCHEDSTAT,
@@ -91,6 +92,7 @@ enum pid_directory_inos {
PROC_TID_STATM,
PROC_TID_MAPS,
PROC_TID_MOUNTS,
+   PROC_TID_MOUNTSTATS,
PROC_TID_WCHAN,
 #ifdef CONFIG_SCHEDSTATS
PROC_TID_SCHEDSTAT,
@@ -134,6 +136,7 @@ static struct pid_entry tgid_base_stuff[
E(PROC_TGID_ROOT,  root,S_IFLNK|S_IRWXUGO),
E(PROC_TGID_EXE,   exe, S_IFLNK|S_IRWXUGO),
E(PROC_TGID_MOUNTS,mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TGID_MOUNTSTATS, mountstats, S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TGID_ATTR,  attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -164,6 +167,7 @@ static struct pid_entry tid_base_stuff[]
E(PROC_TID_ROOT,   root,S_IFLNK|S_IRWXUGO),
E(PROC_TID_EXE,exe, S_IFLNK|S_IRWXUGO),
E(PROC_TID_MOUNTS, mounts,  S_IFREG|S_IRUGO),
+   E(PROC_TID_MOUNTSTATS, mountstats, S_IFREG|S_IRUGO),
 #ifdef CONFIG_SECURITY
E(PROC_TID_ATTR,   attr,S_IFDIR|S_IRUGO|S_IXUGO),
 #endif
@@ -528,6 +532,38 @@ static struct file_operations proc_mount
.release= mounts_release,
 };
 
+extern struct seq_operations mountstats_op;
+static int mountstats_open(struct inode *inode, struct file *file)
+{
+   struct task_struct *task = proc_task(inode);
+   int ret = seq_open(file, mountstats_op);
+
+   if (!ret) {
+   struct seq_file *m = file-private_data;
+   struct namespace *namespace;
+   task_lock(task);
+   namespace = task-namespace;
+   if (namespace)
+   get_namespace(namespace);
+   task_unlock(task);
+
+   if (namespace)
+   m-private = namespace

[PATCH 2/2] NFS: add I/O performance counters

2005-03-17 Thread Chuck Lever
 Add an extensible per-superblock performance counter facility to the NFS
 client.  This facility mimics the counters available for block devices and
 for networking.

 Expose these new counters via /proc/self/mountstats.

 Version: Mon, 14 Mar 2005 17:06:12 -0500
 
 Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---
 
 fs/nfs/dir.c   |8 ++
 fs/nfs/direct.c|5 +
 fs/nfs/file.c  |   20 +++--
 fs/nfs/inode.c |  126 +++--
 fs/nfs/pagelist.c  |   12 ++-
 fs/nfs/read.c  |7 ++
 fs/nfs/write.c |   10 ++
 include/linux/nfs_fs_sb.h  |5 +
 include/linux/nfs_iostat.h |   80 +++
 9 files changed, 256 insertions(+), 17 deletions(-)
 
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/dir.c 
02-nfs-iostat/fs/nfs/dir.c
--- 01-mountstats/fs/nfs/dir.c  2005-03-02 02:38:09.0 -0500
+++ 02-nfs-iostat/fs/nfs/dir.c  2005-03-14 15:28:34.011484000 -0500
@@ -27,6 +27,7 @@
 #include linux/mm.h
 #include linux/sunrpc/clnt.h
 #include linux/nfs_fs.h
+#include linux/nfs_iostat.h
 #include linux/nfs_mount.h
 #include linux/pagemap.h
 #include linux/smp_lock.h
@@ -428,6 +429,8 @@ static int nfs_readdir(struct file *filp
 
lock_kernel();
 
+   nfs_inc_stats(inode, NFS_VFS_GETDENTS);
+
res = nfs_revalidate_inode(NFS_SERVER(inode), inode);
if (res  0) {
unlock_kernel();
@@ -584,6 +587,7 @@ static int nfs_lookup_revalidate(struct 
parent = dget_parent(dentry);
lock_kernel();
dir = parent-d_inode;
+   nfs_inc_stats(dir, NFS_DENTRY_REVALIDATE);
inode = dentry-d_inode;
 
if (nd  !(nd-flags  LOOKUP_CONTINUE)  (nd-flags  LOOKUP_OPEN))
@@ -712,6 +716,7 @@ static struct dentry *nfs_lookup(struct 
 
dfprintk(VFS, NFS: lookup(%s/%s)\n,
dentry-d_parent-d_name.name, dentry-d_name.name);
+   nfs_inc_stats(dir, NFS_VFS_LOOKUP);
 
res = ERR_PTR(-ENAMETOOLONG);
if (dentry-d_name.len  NFS_SERVER(dir)-namelen)
@@ -1116,6 +1121,7 @@ static int nfs_sillyrename(struct inode 
dfprintk(VFS, NFS: silly-rename(%s/%s, ct=%d)\n,
dentry-d_parent-d_name.name, dentry-d_name.name, 
atomic_read(dentry-d_count));
+   nfs_inc_stats(dir, NFS_SILLY_RENAME);
 
 #ifdef NFS_PARANOIA
 if (!dentry-d_inode)
@@ -1500,6 +1506,8 @@ int nfs_permission(struct inode *inode, 
struct rpc_cred *cred;
int res;
 
+   nfs_inc_stats(inode, NFS_VFS_ACCESS);
+
if (mask == 0)
return 0;
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/direct.c 
02-nfs-iostat/fs/nfs/direct.c
--- 01-mountstats/fs/nfs/direct.c   2005-03-02 02:38:25.0 -0500
+++ 02-nfs-iostat/fs/nfs/direct.c   2005-03-14 15:26:16.401349000 -0500
@@ -47,6 +47,7 @@
 #include linux/kref.h
 
 #include linux/nfs_fs.h
+#include linux/nfs_iostat.h
 #include linux/nfs_page.h
 #include linux/sunrpc/clnt.h
 
@@ -354,6 +355,8 @@ static ssize_t nfs_direct_read_seg(struc
result = nfs_direct_read_wait(dreq, clnt-cl_intr);
rpc_clnt_sigunmask(clnt, oldset);
 
+   nfs_add_stats(inode, NFS_WIRE_READ_BYTES, result);
+   nfs_add_stats(inode, NFS_DIRECT_READ_BYTES, result);
return result;
 }
 
@@ -576,6 +579,8 @@ static ssize_t nfs_direct_write(struct i
if (result  size)
break;
}
+   nfs_add_stats(inode, NFS_WIRE_WRITTEN_BYTES, tot_bytes);
+   nfs_add_stats(inode, NFS_DIRECT_WRITTEN_BYTES, tot_bytes);
return tot_bytes;
 }
 
diff -X /home/cel/src/linux/dont-diff -Naurp 01-mountstats/fs/nfs/file.c 
02-nfs-iostat/fs/nfs/file.c
--- 01-mountstats/fs/nfs/file.c 2005-03-02 02:38:38.0 -0500
+++ 02-nfs-iostat/fs/nfs/file.c 2005-03-14 15:42:52.446804000 -0500
@@ -22,6 +22,7 @@
 #include linux/fcntl.h
 #include linux/stat.h
 #include linux/nfs_fs.h
+#include linux/nfs_iostat.h
 #include linux/nfs_mount.h
 #include linux/mm.h
 #include linux/slab.h
@@ -86,18 +87,15 @@ static int nfs_check_flags(int flags)
 static int
 nfs_file_open(struct inode *inode, struct file *filp)
 {
-   struct nfs_server *server = NFS_SERVER(inode);
-   int (*open)(struct inode *, struct file *);
int res;
 
res = nfs_check_flags(filp-f_flags);
if (res)
return res;
 
+   nfs_inc_stats(inode, NFS_VFS_OPEN);
lock_kernel();
-   /* Do NFSv4 open() call */
-   if ((open = server-rpc_ops-file_open) != NULL)
-   res = open(inode, filp);
+   res = NFS_SERVER(inode)-rpc_ops-file_open(inode, filp);
unlock_kernel();
return res;
 }
@@ -105,6 +103,7 @@ nfs_file_open(struct inode *inode, struc
 static int
 nfs_file_release(struct inode *inode, struct file *filp)
 {
+   nfs_inc_stats(inode, NFS_VFS_CLOSE);
return NFS_PROTO(inode)-file_release(inode, filp);
 }
 
@@ -123,6 +122,7

[PATCH 13/13] NFS: Integrate support for processing nfs4 mount options in fs/nfs/super.c

2007-05-21 Thread Chuck Lever
Finally, hook in the new mount option parsing logic.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |   87 
 1 files changed, 19 insertions(+), 68 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index e0acd08..222bb49 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -13,6 +13,8 @@
  *
  *  Split from inode.c by David Howells [EMAIL PROTECTED]
  *
+ *  In-kernel mount option parsing by Chuck Lever [EMAIL PROTECTED]
+ *
  * - superblocks are indexed on server only - all inodes, dentries, etc. 
associated with a
  *   particular server are held in the same superblock
  * - NFS superblocks can have several effective roots to the dentry tree
@@ -1532,7 +1534,6 @@ static int nfs4_parse_options(char *raw, struct 
nfs4_mount_args *mnt)
if (len  80)
goto out_clntaddr_long;
match_strcpy(mnt-clientaddr, args);
-   mnt-nmd.client_addr.data = mnt-clientaddr;
mnt-nmd.client_addr.len = len;
break;
}
@@ -1605,10 +1606,8 @@ static struct nfs4_mount_data 
*nfs4_convert_mount_opts(const char *options)
args-nmd.acdirmax = 60;
 
args-nmd.auth_flavourlen = 0;
-   args-nmd.auth_flavours = args-authflavor;
 
args-nmd.host_addrlen = sizeof(args-addr);
-   args-nmd.host_addr = (struct sockaddr *) args-addr;
 
args-addr.sin_port = htons(NFS_PORT);
 
@@ -1652,6 +1651,7 @@ static int nfs4_validate_mount_data(struct 
nfs4_mount_data **options,
char *ip_addr)
 {
struct nfs4_mount_data *data = *options;
+   struct nfs4_mount_args *args;
char *c;
unsigned len;
 
@@ -1707,25 +1707,26 @@ static int nfs4_validate_mount_data(struct 
nfs4_mount_data **options,
if (IS_ERR(data))
return PTR_ERR(data);
*options = data;
+   args = (struct nfs4_mount_args *) data;
 
-   memcpy(addr, data-host_addr, sizeof(*addr));
-   if (!nfs_verify_server_address((struct sockaddr *) addr,
+   if (!nfs_verify_server_address((struct sockaddr *) args-addr,
data-host_addrlen))
return -EINVAL;
+   memcpy(addr, args-addr, sizeof(*addr));
 
switch (data-auth_flavourlen) {
case 0:
*authflavour = RPC_AUTH_UNIX;
break;
case 1:
-   *authflavour = (rpc_authflavor_t) 
data-auth_flavours[0];
+   *authflavour = (rpc_authflavor_t) args-authflavor;
break;
default:
goto out_inval_auth;
}
 
memset(ip_addr, '\0', data-client_addr.len + 1);
-   strncpy(ip_addr, data-client_addr.data, data-client_addr.len);
+   strncpy(ip_addr, args-clientaddr, data-client_addr.len);
 
/*
 * Split dev_name into hostname:mntpath.
@@ -1804,67 +1805,17 @@ static int nfs4_get_sb(struct file_system_type *fs_type,
struct nfs_fh mntfh;
struct dentry *mntroot;
char *mntpath = NULL, *hostname = NULL, ip_addr[16];
-   void *p;
int error;
 
-   if (data == NULL) {
-   dprintk(%s: missing data argument\n, __FUNCTION__);
-   return -EINVAL;
-   }
-   if (data-version = 0 || data-version  NFS4_MOUNT_VERSION) {
-   dprintk(%s: bad mount version\n, __FUNCTION__);
-   return -EINVAL;
-   }
-
-   /* We now require that the mount process passes the remote address */
-   if (data-host_addrlen != sizeof(addr))
-   return -EINVAL;
-
-   if (copy_from_user(addr, data-host_addr, sizeof(addr)))
-   return -EFAULT;
-
-   if (!nfs_verify_server_address((struct sockaddr *) addr,
-   data-host_addrlen))
-   return -EINVAL;
-
-   /* RFC3530: The default port for NFS is 2049 */
-   if (addr.sin_port == 0)
-   addr.sin_port = htons(NFS_PORT);
-
-   /* Grab the authentication type */
-   authflavour = RPC_AUTH_UNIX;
-   if (data-auth_flavourlen != 0) {
-   if (data-auth_flavourlen != 1) {
-   dprintk(%s: Invalid number of RPC auth flavours %d.\n,
-   __FUNCTION__, data-auth_flavourlen);
-   error = -EINVAL;
-   goto out_err_noserver;
-   }
-
-   if (copy_from_user(authflavour, data-auth_flavours,
-  sizeof(authflavour))) {
-   error = -EFAULT;
-   goto out_err_noserver

[PATCH 06/13] NFS: Improve debugging output in NFS in-kernel mount client

2007-05-21 Thread Chuck Lever
Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/mount_clnt.c|   18 +-
 include/linux/nfs_fs.h |1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c
index f8584ad..81ea782 100644
--- a/fs/nfs/mount_clnt.c
+++ b/fs/nfs/mount_clnt.c
@@ -16,7 +16,7 @@
 #include linux/nfs_fs.h
 
 #ifdef RPC_DEBUG
-# define NFSDBG_FACILITY   NFSDBG_ROOT
+# define NFSDBG_FACILITY   NFSDBG_MOUNT
 #endif
 
 static struct rpc_program  mnt_program;
@@ -72,8 +72,8 @@ int nfs_mount(struct sockaddr_in *addr, char *path, struct 
nfs_fh *fh,
charhostname[32];
int status;
 
-   dprintk(NFS:  nfs_mount(%08x:%s)\n,
-   (unsigned)ntohl(addr-sin_addr.s_addr), path);
+   dprintk(NFS: %s: mounting NIPQUAD_FMT :%s\n,
+   __FUNCTION__, NIPQUAD(addr-sin_addr.s_addr), path);
 
sprintf(hostname, NIPQUAD_FMT, NIPQUAD(addr-sin_addr.s_addr));
mnt_clnt = mnt_create(hostname, addr, version, protocol);
@@ -86,10 +86,18 @@ int nfs_mount(struct sockaddr_in *addr, char *path, struct 
nfs_fh *fh,
msg.rpc_proc = mnt_clnt-cl_procinfo[MNTPROC_MNT];
 
status = rpc_call_sync(mnt_clnt, msg, 0);
-   if (status  0)
+   if (status  0) {
+   dprintk(NFS: %s: rpc_call_sync returned %d\n,
+   __FUNCTION__, status);
return status;
-   if (result.status != 0)
+   }
+   if (result.status != 0) {
+   dprintk(NFS: %s: server returned %d\n,
+   __FUNCTION__, result.status);
return -EACCES;
+   }
+   dprintk(NFS: %s: mount request succeeded\n,
+   __FUNCTION__);
return 0;
 }
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 58f5b77..2f33ef7 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -555,6 +555,7 @@ extern void * nfs_root_data(void);
 #define NFSDBG_ROOT0x0080
 #define NFSDBG_CALLBACK0x0100
 #define NFSDBG_CLIENT  0x0200
+#define NFSDBG_MOUNT   0x0400
 #define NFSDBG_ALL 0x
 
 #ifdef __KERNEL__

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] NFS: Implement NFSv2/3 in-kernel mount option parsing

2007-05-21 Thread Chuck Lever
Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |  130 +++-
 1 files changed, 82 insertions(+), 48 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index a9f698b..7b7cacb 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -935,8 +935,6 @@ static struct nfs_mount_data *nfs_convert_mount_opts(const 
char *options)
if (args == NULL)
return ERR_PTR(-ENOMEM);
 
-   args-nmd.version = 7;
-
args-nmd.flags = (NFS_MOUNT_VER3 | NFS_MOUNT_TCP);
args-nmd.rsize = NFS_MAX_FILE_IO_SIZE;
args-nmd.wsize = NFS_MAX_FILE_IO_SIZE;
@@ -989,71 +987,74 @@ out_invalid:
  * Validate the NFS2/NFS3 mount data
  * - fills in the mount root filehandle
  */
-static int nfs_validate_mount_data(struct nfs_mount_data *data,
-  struct nfs_fh *mntfh)
+static int nfs_validate_mount_data(struct nfs_mount_data **options,
+  struct nfs_fh *mntfh,
+  const char *dev_name)
 {
-   if (data == NULL) {
-   dprintk(%s: missing data argument\n, __FUNCTION__);
-   return -EINVAL;
-   }
+   struct nfs_mount_data *data = *options;
+   unsigned int len;
+   char *c;
+   int status;
 
-   if (data-version = 0 || data-version  NFS_MOUNT_VERSION) {
-   dprintk(%s: bad mount version\n, __FUNCTION__);
-   return -EINVAL;
-   }
+   if (data == NULL)
+   goto out_no_data;
 
switch (data-version) {
-   case 1:
-   data-namlen = 0;
-   case 2:
-   data-bsize  = 0;
-   case 3:
-   if (data-flags  NFS_MOUNT_VER3) {
-   dprintk(%s: mount structure version %d does 
not support NFSv3\n,
-   __FUNCTION__,
-   data-version);
-   return -EINVAL;
-   }
-   data-root.size = NFS2_FHSIZE;
-   memcpy(data-root.data, data-old_root.data, 
NFS2_FHSIZE);
-   case 4:
-   if (data-flags  NFS_MOUNT_SECFLAVOUR) {
-   dprintk(%s: mount structure version %d does 
not support strong security\n,
-   __FUNCTION__,
-   data-version);
-   return -EINVAL;
-   }
-   case 5:
-   memset(data-context, 0, sizeof(data-context));
+   case 1:
+   data-namlen = 0;
+   case 2:
+   data-bsize  = 0;
+   case 3:
+   if (data-flags  NFS_MOUNT_VER3)
+   goto out_no_v3;
+   data-root.size = NFS2_FHSIZE;
+   memcpy(data-root.data, data-old_root.data, NFS2_FHSIZE);
+   case 4:
+   if (data-flags  NFS_MOUNT_SECFLAVOUR)
+   goto out_no_sec;
+   case 5:
+   memset(data-context, 0, sizeof(data-context));
+   case 6:
+   break;
+   default:
+   data = nfs_convert_mount_opts((char *) data);
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+   *options = data;
+
+   c = strchr(dev_name, ':');
+   if (c == NULL)
+   return -EINVAL;
+   len = c - dev_name - 1;
+   if (len  256)
+   return -EINVAL;
+   strncpy(data-hostname, dev_name, len);
+
+   status = nfs_try_mount(data, ++c);
+   if (status)
+   return -EINVAL;
}
 
-   /* Set the pseudoflavor */
if (!(data-flags  NFS_MOUNT_SECFLAVOUR))
data-pseudoflavor = RPC_AUTH_UNIX;
 
 #ifndef CONFIG_NFS_V3
-   /* If NFSv3 is not compiled in, return -EPROTONOSUPPORT */
-   if (data-flags  NFS_MOUNT_VER3) {
-   dprintk(%s: NFSv3 not compiled into kernel\n, __FUNCTION__);
-   return -EPROTONOSUPPORT;
-   }
-#endif /* CONFIG_NFS_V3 */
+   if (data-flags  NFS_MOUNT_VER3)
+   goto out_v3_not_compiled;
+#endif /* !CONFIG_NFS_V3 */
 
/* We now require that the mount process passes the remote address */
if (!nfs_verify_server_address((struct sockaddr *) data-addr,
sizeof(data-addr)))
return -EINVAL;
 
-   /* Prepare the root filehandle */
if (data-flags  NFS_MOUNT_VER3)
mntfh-size = data-root.size;
else
mntfh-size = NFS2_FHSIZE;
 
-   if (mntfh-size  sizeof(mntfh-data)) {
-   dprintk(%s: invalid root filehandle\n, __FUNCTION__);
-   return

[PATCH 01/13] NFS: Refactor IP address sanity checks in NFS client

2007-05-21 Thread Chuck Lever
Provide mechanism for adding IPv6 address support at some later point.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
Cc: Aurelien Charbon [EMAIL PROTECTED]
---

 fs/nfs/super.c |   39 ---
 1 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 1fce778..31f7313 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -436,6 +436,28 @@ static void nfs_umount_begin(struct vfsmount *vfsmnt, int 
flags)
 }
 
 /*
+ * Sanity-check a server address provided by the mount command
+ */
+static int nfs_verify_server_address(struct sockaddr *addr, size_t len)
+{
+   if (len  sizeof(struct sockaddr))
+   goto out_invalid;
+
+   switch (addr-sa_family) {
+   case AF_INET: {
+   struct sockaddr_in *sa = (struct sockaddr_in *) addr;
+   if (sa-sin_addr.s_addr != INADDR_ANY)
+   return 1;
+   break;
+   }
+   }
+
+out_invalid:
+   dprintk(NFS: mount program passed an invalid remote address\n);
+   return 0;
+}
+
+/*
  * Validate the NFS2/NFS3 mount data
  * - fills in the mount root filehandle
  */
@@ -490,11 +512,9 @@ static int nfs_validate_mount_data(struct nfs_mount_data 
*data,
 #endif /* CONFIG_NFS_V3 */
 
/* We now require that the mount process passes the remote address */
-   if (data-addr.sin_addr.s_addr == INADDR_ANY) {
-   dprintk(%s: mount program didn't pass remote address!\n,
-   __FUNCTION__);
-   return -EINVAL;
-   }
+   if (!nfs_verify_server_address((struct sockaddr *) data-addr,
+   sizeof(data-addr)))
+   return -EINVAL;
 
/* Prepare the root filehandle */
if (data-flags  NFS_MOUNT_VER3)
@@ -828,13 +848,10 @@ static int nfs4_get_sb(struct file_system_type *fs_type,
if (copy_from_user(addr, data-host_addr, sizeof(addr)))
return -EFAULT;
 
-   if (addr.sin_family != AF_INET ||
-   addr.sin_addr.s_addr == INADDR_ANY
-   ) {
-   dprintk(%s: mount program didn't pass remote IP address!\n,
-   __FUNCTION__);
+   if (!nfs_verify_server_address((struct sockaddr *) addr,
+   data-host_addrlen))
return -EINVAL;
-   }
+
/* RFC3530: The default port for NFS is 2049 */
if (addr.sin_port == 0)
addr.sin_port = htons(NFS_PORT);

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/13] NFS: Add functions to parse nfs mount options to fs/nfs/super.c

2007-05-21 Thread Chuck Lever
For NFSv2 and NFSv3 mount options.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |  449 
 1 files changed, 449 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 1974648..a9f698b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -514,6 +514,455 @@ static void nfs_umount_begin(struct vfsmount *vfsmnt, int 
flags)
shrink_submounts(vfsmnt, nfs_automount_list);
 }
 
+
+static match_table_t nfs_tokens = {
+   {Opt_userspace, bg},
+   {Opt_userspace, fg},
+   {Opt_soft, soft},
+   {Opt_hard, hard},
+   {Opt_intr, intr},
+   {Opt_nointr, nointr},
+   {Opt_posix, posix},
+   {Opt_noposix, noposix},
+   {Opt_cto, cto},
+   {Opt_nocto, nocto},
+   {Opt_ac, ac},
+   {Opt_noac, noac},
+   {Opt_lock, lock},
+   {Opt_nolock, nolock},
+   {Opt_v2, v2},
+   {Opt_v3, v3},
+   {Opt_udp, udp},
+   {Opt_tcp, tcp},
+   {Opt_acl, acl},
+   {Opt_noacl, noacl},
+
+   {Opt_port, port=%u},
+   {Opt_rsize, rsize=%u},
+   {Opt_wsize, wsize=%u},
+   {Opt_timeo, timeo=%u},
+   {Opt_retrans, retrans=%u},
+   {Opt_acregmin, acregmin=%u},
+   {Opt_acregmax, acregmax=%u},
+   {Opt_acdirmin, acdirmin=%u},
+   {Opt_acdirmax, acdirmax=%u},
+   {Opt_actimeo, actimeo=%u},
+   {Opt_userspace, retry=%u},
+   {Opt_namelen, namlen=%u},
+   {Opt_mountport, mountport=%u},
+   {Opt_mountprog, mountprog=%u},
+   {Opt_mountvers, mountvers=%u},
+   {Opt_nfsprog, nfsprog=%u},
+   {Opt_nfsvers, nfsvers=%u},
+   {Opt_nfsvers, vers=%u},
+
+   {Opt_sec, sec=%s},
+   {Opt_proto, proto=%s},
+   {Opt_addr, addr=%s},
+   {Opt_mounthost, mounthost=%s},
+   {Opt_context, context=%s},
+
+   {Opt_err, NULL},
+};
+
+static int nfs_parse_options(char *raw, struct nfs_mount_args *mnt)
+{
+   char *p, *string;
+
+   if (!raw) {
+   dprintk(NFS: mount options string was NULL.\n);
+   return 1;
+   }
+
+   while ((p = strsep (raw, ,)) != NULL) {
+   substring_t args[MAX_OPT_ARGS];
+   int option, token;
+
+   if (!*p)
+   continue;
+   token = match_token(p, nfs_tokens, args);
+
+   dprintk(NFS:   nfs mount option '%s': parsing token %d\n,
+   p, token);
+
+   switch (token) {
+   case Opt_soft:
+   mnt-nmd.flags |= NFS_MOUNT_SOFT;
+   break;
+   case Opt_hard:
+   mnt-nmd.flags = ~NFS_MOUNT_SOFT;
+   break;
+   case Opt_intr:
+   mnt-nmd.flags |= NFS_MOUNT_INTR;
+   break;
+   case Opt_nointr:
+   mnt-nmd.flags = ~NFS_MOUNT_INTR;
+   break;
+   case Opt_posix:
+   mnt-nmd.flags |= NFS_MOUNT_POSIX;
+   break;
+   case Opt_noposix:
+   mnt-nmd.flags = ~NFS_MOUNT_POSIX;
+   break;
+   case Opt_cto:
+   mnt-nmd.flags = ~NFS_MOUNT_NOCTO;
+   break;
+   case Opt_nocto:
+   mnt-nmd.flags |= NFS_MOUNT_NOCTO;
+   break;
+   case Opt_ac:
+   mnt-nmd.flags = ~NFS_MOUNT_NOAC;
+   break;
+   case Opt_noac:
+   mnt-nmd.flags |= NFS_MOUNT_NOAC;
+   break;
+   case Opt_lock:
+   mnt-nmd.flags = ~NFS_MOUNT_NONLM;
+   break;
+   case Opt_nolock:
+   mnt-nmd.flags |= NFS_MOUNT_NONLM;
+   break;
+   case Opt_v2:
+   mnt-nmd.flags = ~NFS_MOUNT_VER3;
+   break;
+   case Opt_v3:
+   mnt-nmd.flags |= NFS_MOUNT_VER3;
+   break;
+   case Opt_udp:
+   mnt-nmd.flags = ~NFS_MOUNT_TCP;
+   break;
+   case Opt_tcp:
+   mnt-nmd.flags |= NFS_MOUNT_TCP;
+   break;
+   case Opt_acl:
+   mnt-nmd.flags = ~NFS_MOUNT_NOACL;
+   break;
+   case Opt_noacl:
+   mnt-nmd.flags |= NFS_MOUNT_NOACL;
+   break;
+
+   case Opt_port:
+   if (match_int(args, option))
+   return 0;
+   if (option  0 || option  65535)
+   return 0;
+   mnt-nmd.addr.sin_port = htonl(option);
+   break;
+   case Opt_rsize

[PATCH 10/13] NFS: Add functions to parse nfs4 mount options to fs/nfs/super.c

2007-05-21 Thread Chuck Lever
Add helpers required for parsing nfs4 mount options in the NFS
client.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |  290 
 1 files changed, 290 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7b7cacb..927c1c2 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1332,6 +1332,296 @@ error_splat_super:
 
 #ifdef CONFIG_NFS_V4
 
+static match_table_t nfs4_tokens = {
+   {Opt_userspace, bg},
+   {Opt_userspace, fg},
+   {Opt_soft, soft},
+   {Opt_hard, hard},
+   {Opt_intr, intr},
+   {Opt_nointr, nointr},
+   {Opt_cto, cto},
+   {Opt_nocto, nocto},
+   {Opt_ac, ac},
+   {Opt_noac, noac},
+
+   {Opt_port, port=%u},
+   {Opt_rsize, rsize=%u},
+   {Opt_wsize, wsize=%u},
+   {Opt_timeo, timeo=%u},
+   {Opt_retrans, retrans=%u},
+   {Opt_acregmin, acregmin=%u},
+   {Opt_acregmax, acregmax=%u},
+   {Opt_acdirmin, acdirmin=%u},
+   {Opt_acdirmax, acdirmax=%u},
+   {Opt_actimeo, actimeo=%u},
+   {Opt_userspace, retry=%u},
+
+   {Opt_sec, sec=%s},
+   {Opt_proto, proto=%s},
+   {Opt_addr, addr=%s},
+   {Opt_clientaddr, clientaddr=%s},
+
+   {Opt_err, NULL},
+};
+
+static int nfs4_parse_options(char *raw, struct nfs4_mount_args *mnt)
+{
+   char *p, *string;
+
+   if (!raw)
+   return 1;
+
+   while ((p = strsep (raw, ,)) != NULL) {
+   substring_t args[MAX_OPT_ARGS];
+   int option, token;
+
+   if (!*p)
+   continue;
+   token = match_token(p, nfs4_tokens, args);
+
+   dprintk(NFS:   nfs4 mount option '%s': parsing token %d\n,
+   p, token);
+
+   switch (token) {
+   case Opt_soft:
+   mnt-nmd.flags |= NFS4_MOUNT_SOFT;
+   break;
+   case Opt_hard:
+   mnt-nmd.flags = ~NFS4_MOUNT_SOFT;
+   break;
+   case Opt_intr:
+   mnt-nmd.flags |= NFS4_MOUNT_INTR;
+   break;
+   case Opt_nointr:
+   mnt-nmd.flags = ~NFS4_MOUNT_INTR;
+   break;
+   case Opt_cto:
+   mnt-nmd.flags = ~NFS4_MOUNT_NOCTO;
+   break;
+   case Opt_nocto:
+   mnt-nmd.flags |= NFS4_MOUNT_NOCTO;
+   break;
+   case Opt_ac:
+   mnt-nmd.flags = ~NFS4_MOUNT_NOAC;
+   break;
+   case Opt_noac:
+   mnt-nmd.flags |= NFS4_MOUNT_NOAC;
+   break;
+
+   case Opt_port:
+   if (match_int(args, option))
+   return 0;
+   if (option  0 || option  65535)
+   return 0;
+   mnt-addr.sin_port = htonl(option);
+   break;
+   case Opt_rsize:
+   if (match_int(args, mnt-nmd.rsize))
+   return 0;
+   break;
+   case Opt_wsize:
+   if (match_int(args, mnt-nmd.wsize))
+   return 0;
+   break;
+   case Opt_timeo:
+   if (match_int(args, mnt-nmd.timeo))
+   return 0;
+   break;
+   case Opt_retrans:
+   if (match_int(args, mnt-nmd.retrans))
+   return 0;
+   break;
+   case Opt_acregmin:
+   if (match_int(args, mnt-nmd.acregmin))
+   return 0;
+   break;
+   case Opt_acregmax:
+   if (match_int(args, mnt-nmd.acregmax))
+   return 0;
+   break;
+   case Opt_acdirmin:
+   if (match_int(args, mnt-nmd.acdirmin))
+   return 0;
+   break;
+   case Opt_acdirmax:
+   if (match_int(args, mnt-nmd.acdirmax))
+   return 0;
+   break;
+   case Opt_actimeo:
+   if (match_int(args, option))
+   return 0;
+   if (option  0)
+   return 0;
+   mnt-nmd.acregmin =
+   mnt-nmd.acregmax =
+   mnt-nmd.acdirmin =
+   mnt-nmd.acdirmax = option;
+   break;
+
+   case Opt_proto: {
+   string = match_strdup(args

[PATCH 04/13] NFS: Remake nfsroot_mount as a permanent part of NFS client

2007-05-21 Thread Chuck Lever
In preparation for supporting NFSv2 and NFSv3 mount option handling in the
kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS
client, instead of built only when CONFIG_ROOT_NFS is enabled.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/Makefile|4 ++--
 fs/nfs/mount_clnt.c|   18 +-
 fs/nfs/nfsroot.c   |2 +-
 include/linux/nfs_fs.h |4 +---
 4 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index f4580b4..b55cb23 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -6,8 +6,8 @@ obj-$(CONFIG_NFS_FS) += nfs.o
 
 nfs-y  := client.o dir.o file.o getroot.o inode.o super.o 
nfs2xdr.o \
   pagelist.o proc.o read.o symlink.o unlink.o \
-  write.o namespace.o
-nfs-$(CONFIG_ROOT_NFS) += nfsroot.o mount_clnt.o  
+  write.o namespace.o mount_clnt.o
+nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
 nfs-$(CONFIG_NFS_V3)   += nfs3proc.o nfs3xdr.o
 nfs-$(CONFIG_NFS_V3_ACL)   += nfs3acl.o
 nfs-$(CONFIG_NFS_V4)   += nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o \
diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c
index ca5a266..82a8536 100644
--- a/fs/nfs/mount_clnt.c
+++ b/fs/nfs/mount_clnt.c
@@ -37,12 +37,20 @@ struct mnt_fhstatus {
struct nfs_fh * fh;
 };
 
-/*
- * Obtain an NFS file handle for the given host and path
+/**
+ * nfs_mount - Obtain an NFS file handle for the given host and path
+ * @addr: pointer to server's address
+ * @path: pointer to string containing export path to mount
+ * @fh: pointer to location to place returned file handle
+ * @version: mount version to use for this request
+ * @protocol: transport protocol to use for thie request
+ *
+ * Uses default timeout parameters specified by underlying transport.
+ *
+ * XXX: Needs to support IPv6
  */
-int
-nfsroot_mount(struct sockaddr_in *addr, char *path, struct nfs_fh *fh,
-   int version, int protocol)
+int nfs_mount(struct sockaddr_in *addr, char *path, struct nfs_fh *fh,
+ int version, int protocol)
 {
struct rpc_clnt *mnt_clnt;
struct mnt_fhstatus result = {
diff --git a/fs/nfs/nfsroot.c b/fs/nfs/nfsroot.c
index f0db470..a52c891 100644
--- a/fs/nfs/nfsroot.c
+++ b/fs/nfs/nfsroot.c
@@ -496,7 +496,7 @@ static int __init root_nfs_get_handle(void)
NFS_MNT3_VERSION : NFS_MNT_VERSION;
 
set_sockaddr(sin, servaddr, htons(mount_port));
-   status = nfsroot_mount(sin, nfs_path, fh, version, protocol);
+   status = nfs_mount(sin, nfs_path, fh, version, protocol);
if (status  0)
printk(KERN_ERR Root-NFS: Server returned error %d 
while mounting %s\n, status, nfs_path);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 0543439..58f5b77 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -496,10 +496,8 @@ static inline void nfs3_forget_cached_acls(struct inode 
*inode)
 
 /*
  * linux/fs/mount_clnt.c
- * (Used only by nfsroot module)
  */
-extern int  nfsroot_mount(struct sockaddr_in *, char *, struct nfs_fh *,
-   int, int);
+extern int  nfs_mount(struct sockaddr_in *, char *, struct nfs_fh *, int, int);
 
 /*
  * inline functions

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/13] NFS: Clean up in-kernel NFS mount

2007-05-21 Thread Chuck Lever
Clean up white space and coding conventions.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/mount_clnt.c |  132 ---
 1 files changed, 63 insertions(+), 69 deletions(-)

diff --git a/fs/nfs/mount_clnt.c b/fs/nfs/mount_clnt.c
index 82a8536..f8584ad 100644
--- a/fs/nfs/mount_clnt.c
+++ b/fs/nfs/mount_clnt.c
@@ -1,7 +1,5 @@
 /*
- * linux/fs/nfs/mount_clnt.c
- *
- * MOUNT client to support NFSroot.
+ * In-kernel MOUNT protocol client
  *
  * Copyright (C) 1997, Olaf Kirch [EMAIL PROTECTED]
  */
@@ -21,22 +19,33 @@
 # define NFSDBG_FACILITY   NFSDBG_ROOT
 #endif
 
-/*
-#define MOUNT_PROGRAM  15
-#define MOUNT_VERSION  1
-#define MOUNT_MNT  1
-#define MOUNT_UMNT 3
- */
-
-static struct rpc_clnt *   mnt_create(char *, struct sockaddr_in *,
-   int, int);
 static struct rpc_program  mnt_program;
 
 struct mnt_fhstatus {
-   unsigned intstatus;
-   struct nfs_fh * fh;
+   u32 status;
+   struct nfs_fh *fh;
 };
 
+static struct rpc_clnt *mnt_create(char *hostname,
+  struct sockaddr_in *srvaddr,
+  int version,
+  int protocol)
+{
+   struct rpc_create_args args = {
+   .protocol   = protocol,
+   .address= (struct sockaddr *)srvaddr,
+   .addrsize   = sizeof(*srvaddr),
+   .servername = hostname,
+   .program= mnt_program,
+   .version= version,
+   .authflavor = RPC_AUTH_UNIX,
+   .flags  = (RPC_CLNT_CREATE_ONESHOT |
+  RPC_CLNT_CREATE_INTR),
+   };
+
+   return rpc_create(args);
+}
+
 /**
  * nfs_mount - Obtain an NFS file handle for the given host and path
  * @addr: pointer to server's address
@@ -66,7 +75,7 @@ int nfs_mount(struct sockaddr_in *addr, char *path, struct 
nfs_fh *fh,
dprintk(NFS:  nfs_mount(%08x:%s)\n,
(unsigned)ntohl(addr-sin_addr.s_addr), path);
 
-   sprintf(hostname, %u.%u.%u.%u, NIPQUAD(addr-sin_addr.s_addr));
+   sprintf(hostname, NIPQUAD_FMT, NIPQUAD(addr-sin_addr.s_addr));
mnt_clnt = mnt_create(hostname, addr, version, protocol);
if (IS_ERR(mnt_clnt))
return PTR_ERR(mnt_clnt);
@@ -77,33 +86,18 @@ int nfs_mount(struct sockaddr_in *addr, char *path, struct 
nfs_fh *fh,
msg.rpc_proc = mnt_clnt-cl_procinfo[MNTPROC_MNT];
 
status = rpc_call_sync(mnt_clnt, msg, 0);
-   return status  0? status : (result.status? -EACCES : 0);
-}
-
-static struct rpc_clnt *
-mnt_create(char *hostname, struct sockaddr_in *srvaddr, int version,
-   int protocol)
-{
-   struct rpc_create_args args = {
-   .protocol   = protocol,
-   .address= (struct sockaddr *)srvaddr,
-   .addrsize   = sizeof(*srvaddr),
-   .servername = hostname,
-   .program= mnt_program,
-   .version= version,
-   .authflavor = RPC_AUTH_UNIX,
-   .flags  = (RPC_CLNT_CREATE_ONESHOT |
-  RPC_CLNT_CREATE_INTR),
-   };
-
-   return rpc_create(args);
+   if (status  0)
+   return status;
+   if (result.status != 0)
+   return -EACCES;
+   return 0;
 }
 
 /*
  * XDR encode/decode functions for MOUNT
  */
-static int
-xdr_encode_dirpath(struct rpc_rqst *req, __be32 *p, const char *path)
+static int xdr_encode_dirpath(struct rpc_rqst *req, __be32 *p,
+ const char *path)
 {
p = xdr_encode_string(p, path);
 
@@ -111,8 +105,8 @@ xdr_encode_dirpath(struct rpc_rqst *req, __be32 *p, const 
char *path)
return 0;
 }
 
-static int
-xdr_decode_fhstatus(struct rpc_rqst *req, __be32 *p, struct mnt_fhstatus *res)
+static int xdr_decode_fhstatus(struct rpc_rqst *req, __be32 *p,
+  struct mnt_fhstatus *res)
 {
struct nfs_fh *fh = res-fh;
 
@@ -123,8 +117,8 @@ xdr_decode_fhstatus(struct rpc_rqst *req, __be32 *p, struct 
mnt_fhstatus *res)
return 0;
 }
 
-static int
-xdr_decode_fhstatus3(struct rpc_rqst *req, __be32 *p, struct mnt_fhstatus *res)
+static int xdr_decode_fhstatus3(struct rpc_rqst *req, __be32 *p,
+   struct mnt_fhstatus *res)
 {
struct nfs_fh *fh = res-fh;
 
@@ -143,53 +137,53 @@ xdr_decode_fhstatus3(struct rpc_rqst *req, __be32 *p, 
struct mnt_fhstatus *res)
 #define MNT_fhstatus_sz(1 + 8)
 #define MNT_fhstatus3_sz   (1 + 16)
 
-static struct rpc_procinfo mnt_procedures[] = {
-[MNTPROC_MNT] = {
- .p_proc   = MNTPROC_MNT,
- .p_encode = (kxdrproc_t

[PATCH 03/13] SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name

2007-05-21 Thread Chuck Lever
Clean up, for consistency.  Rename rpcb_getport as rpcb_getport_async, to
match the naming scheme of rpcb_getport_sync.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 include/linux/sunrpc/clnt.h |2 +-
 net/sunrpc/rpcb_clnt.c  |   37 +++--
 net/sunrpc/xprtsock.c   |4 ++--
 3 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index c51bc8c..9bea7b5 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -124,8 +124,8 @@ int rpc_destroy_client(struct rpc_clnt *);
 void   rpc_release_client(struct rpc_clnt *);
 
 intrpcb_register(u32, u32, int, unsigned short, int *);
-void   rpcb_getport(struct rpc_task *);
 intrpcb_getport_sync(struct sockaddr_in *, __u32, __u32, int);
+void   rpcb_getport_async(struct rpc_task *);
 
 void   rpc_call_setup(struct rpc_task *, struct rpc_message *, int);
 
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 5a52604..905ba5a 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -298,13 +298,13 @@ int rpcb_getport_sync(struct sockaddr_in *sin, __u32 prog,
 EXPORT_SYMBOL_GPL(rpcb_getport_sync);
 
 /**
- * rpcb_getport - obtain the port for a given RPC service on a given host
+ * rpcb_getport_async - obtain the port for a given RPC service on a given host
  * @task: task that is waiting for portmapper request
  *
  * This one can be called for an ongoing RPC request, and can be used in
  * an async (rpciod) context.
  */
-void rpcb_getport(struct rpc_task *task)
+void rpcb_getport_async(struct rpc_task *task)
 {
struct rpc_clnt *clnt = task-tk_client;
int bind_version;
@@ -315,17 +315,17 @@ void rpcb_getport(struct rpc_task *task)
struct sockaddr addr;
int status;
 
-   dprintk(RPC: %5u rpcb_getport(%s, %u, %u, %d)\n,
-   task-tk_pid, clnt-cl_server,
-   clnt-cl_prog, clnt-cl_vers, xprt-prot);
+   dprintk(RPC: %5u %s(%s, %u, %u, %d)\n,
+   task-tk_pid, __FUNCTION__,
+   clnt-cl_server, clnt-cl_prog, clnt-cl_vers, xprt-prot);
 
/* Autobind on cloned rpc clients is discouraged */
BUG_ON(clnt-cl_parent != clnt);
 
if (xprt_test_and_set_binding(xprt)) {
status = -EACCES;   /* tell caller to check again */
-   dprintk(RPC: %5u rpcb_getport waiting for another binder\n,
-   task-tk_pid);
+   dprintk(RPC: %5u %s: waiting for another binder\n,
+   task-tk_pid, __FUNCTION__);
goto bailout_nowake;
}
 
@@ -336,27 +336,28 @@ void rpcb_getport(struct rpc_task *task)
/* Someone else may have bound if we slept */
if (xprt_bound(xprt)) {
status = 0;
-   dprintk(RPC: %5u rpcb_getport already bound\n, task-tk_pid);
+   dprintk(RPC: %5u %s: already bound\n,
+   task-tk_pid, __FUNCTION__);
goto bailout_nofree;
}
 
if (rpcb_next_version[xprt-bind_index].rpc_proc == NULL) {
xprt-bind_index = 0;
status = -EACCES;   /* tell caller to try again later */
-   dprintk(RPC: %5u rpcb_getport no more getport versions 
-   available\n, task-tk_pid);
+   dprintk(RPC: %5u %s: no more getport versions available\n,
+   task-tk_pid, __FUNCTION__);
goto bailout_nofree;
}
bind_version = rpcb_next_version[xprt-bind_index].rpc_vers;
 
-   dprintk(RPC: %5u rpcb_getport trying rpcbind version %u\n,
-   task-tk_pid, bind_version);
+   dprintk(RPC: %5u %s: trying rpcbind version %u\n,
+   task-tk_pid, __FUNCTION__, bind_version);
 
map = kzalloc(sizeof(struct rpcbind_args), GFP_ATOMIC);
if (!map) {
status = -ENOMEM;
-   dprintk(RPC: %5u rpcb_getport no memory available\n,
-   task-tk_pid);
+   dprintk(RPC: %5u %s: no memory available\n,
+   task-tk_pid, __FUNCTION__);
goto bailout_nofree;
}
map-r_prog = clnt-cl_prog;
@@ -374,16 +375,16 @@ void rpcb_getport(struct rpc_task *task)
rpcb_clnt = rpcb_create(clnt-cl_server, addr, xprt-prot, 
bind_version, 0);
if (IS_ERR(rpcb_clnt)) {
status = PTR_ERR(rpcb_clnt);
-   dprintk(RPC: %5u rpcb_getport rpcb_create failed, error %ld\n,
-   task-tk_pid, PTR_ERR(rpcb_clnt));
+   dprintk(RPC: %5u %s: rpcb_create failed, error %ld\n,
+   task-tk_pid, __FUNCTION__, PTR_ERR(rpcb_clnt));
goto bailout;
}
 
child = rpc_run_task(rpcb_clnt, RPC_TASK_ASYNC

[PATCH 07/13] NFS: New infrastructure for NFS client in-kernel mount option parsing

2007-05-21 Thread Chuck Lever
Add some data structures and definitions to support parsing NFS mount
options in the kernel NFS client.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |   79 
 1 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 31f7313..1974648 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -45,6 +45,7 @@
 #include linux/inet.h
 #include linux/nfs_xdr.h
 #include linux/magic.h
+#include linux/parser.h
 
 #include asm/system.h
 #include asm/uaccess.h
@@ -57,6 +58,84 @@
 
 #define NFSDBG_FACILITYNFSDBG_VFS
 
+
+struct nfs_mount_args {
+   struct nfs_mount_data nmd;
+   unsigned int nfsprog;
+   unsigned use_mnthost;
+   struct sockaddr_in mnthost;
+   unsigned int mntprog;
+   unsigned int mntvers;
+   unsigned short mntport;
+};
+
+struct nfs4_mount_args {
+   struct nfs4_mount_data nmd;
+   struct sockaddr_in addr;
+   char clientaddr[16];
+   int authflavor;
+};
+
+enum {
+   /* Mount options that take no arguments */
+   Opt_soft, Opt_hard,
+   Opt_intr, Opt_nointr,
+   Opt_posix, Opt_noposix,
+   Opt_cto, Opt_nocto,
+   Opt_ac, Opt_noac,
+   Opt_lock, Opt_nolock,
+   Opt_v2, Opt_v3,
+   Opt_udp, Opt_tcp,
+   Opt_acl, Opt_noacl,
+
+   /* Mount options that take integer arguments */
+   Opt_port,
+   Opt_rsize, Opt_wsize,
+   Opt_timeo, Opt_retrans,
+   Opt_acregmin, Opt_acregmax,
+   Opt_acdirmin, Opt_acdirmax,
+   Opt_actimeo,
+   Opt_namelen,
+   Opt_mountport,
+   Opt_mountprog, Opt_mountvers,
+   Opt_nfsprog, Opt_nfsvers,
+
+   /* Mount options that take string arguments */
+   Opt_sec, Opt_proto, Opt_addr,
+   Opt_mounthost, Opt_clientaddr, Opt_context,
+
+   /* Mount options that are ignored */
+   Opt_userspace, Opt_deprecated,
+
+   Opt_err,
+};
+
+enum {
+   Opt_sec_none, Opt_sec_sys,
+   Opt_sec_krb5, Opt_sec_krb5i, Opt_sec_krb5p,
+   Opt_sec_lkey, Opt_sec_lkeyi, Opt_sec_lkeyp,
+   Opt_sec_spkm, Opt_sec_spkmi, Opt_sec_spkmp,
+
+   Opt_sec_err,
+};
+
+static match_table_t nfs_sec_tokens = {
+   {Opt_sec_none, none},
+   {Opt_sec_none, null},
+   {Opt_sec_sys, sys},
+
+   {Opt_sec_krb5, krb5},
+   {Opt_sec_krb5i, krb5i},
+   {Opt_sec_krb5p, krb5p},
+
+   {Opt_sec_lkey, lkey},
+   {Opt_sec_lkeyi, lkeyi},
+   {Opt_sec_lkeyp, lkeyp},
+
+   {Opt_sec_err, NULL},
+};
+
+
 static void nfs_umount_begin(struct vfsmount *, int);
 static int  nfs_statfs(struct dentry *, struct kstatfs *);
 static int  nfs_show_options(struct seq_file *, struct vfsmount *);

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/13] Support NFS mount option parsing in the kernel

2007-05-21 Thread Chuck Lever
This patch series introduces support for parsing NFS mount options in the
kernel, similar to support that exists for many other Linux file systems such
as ext3, autofs, fat, cifs, hfs, and ocfs2.

I'd like to integrate this patch set into -mm to encourage wide review and
perhaps get some penetration testing before moving forward with integration
into the mainline kernel.

Future enhancements might include caching connections to mountd so we don't
use up so many privileged ports during mount storms, removing similar
infrastructure in NFSROOT in favor of this implementation, and support for NFS
over IPv6 and RDMA.

-- 
corporate:chuck dot lever at oracle dot com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/13] NFS: More nfs4 in-kernel mount option parsing infrastructure

2007-05-21 Thread Chuck Lever
Add function for switching between an nfs4_mount_data structure from user
space (the current nfs4 mount mechanism) and generating an nfs4_mount_data
structure from a text string containing nfs4 mount options.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |  123 
 1 files changed, 123 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 8585fa5..e0acd08 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1643,6 +1643,129 @@ static void *nfs_copy_user_string(char *dst, struct 
nfs_string *src, int maxlen)
return dst;
 }
 
+static int nfs4_validate_mount_data(struct nfs4_mount_data **options,
+   const char *dev_name,
+   struct sockaddr_in *addr,
+   rpc_authflavor_t *authflavour,
+   char **hostname,
+   char **mntpath,
+   char *ip_addr)
+{
+   struct nfs4_mount_data *data = *options;
+   char *c;
+   unsigned len;
+
+   if (data == NULL) {
+   dprintk(%s: missing data argument\n, __FUNCTION__);
+   return -EINVAL;
+   }
+
+   switch (data-version) {
+   case 1:
+   if (data-host_addrlen != sizeof(*addr))
+   return -EINVAL;
+   if (copy_from_user(addr, data-host_addr, sizeof(*addr)))
+   return -EFAULT;
+   if (addr-sin_port == 0)
+   addr-sin_port = htons(NFS_PORT);
+   if (!nfs_verify_server_address((struct sockaddr *) addr,
+  data-host_addrlen))
+   return -EINVAL;
+
+   switch (data-auth_flavourlen) {
+   case 0:
+   *authflavour = RPC_AUTH_UNIX;
+   break;
+   case 1:
+   if (copy_from_user(authflavour, data-auth_flavours,
+  sizeof(*authflavour)))
+   return -EFAULT;
+   default:
+   goto out_inval_auth;
+   }
+
+   c = nfs_copy_user_string(ip_addr, data-client_addr, 80);
+   if (IS_ERR(c))
+   return PTR_ERR(c);
+
+   c = nfs_copy_user_string(NULL, data-hostname, 256);
+   if (IS_ERR(c))
+   return PTR_ERR(c);
+   *hostname = c;
+
+   c = nfs_copy_user_string(NULL, data-mnt_path, 1024);
+   if (IS_ERR(c)) {
+   kfree(*hostname);
+   return PTR_ERR(c);
+   }
+   *mntpath = c;
+   dprintk(MNTPATH: %s\n, *mntpath);
+
+   return 0;
+   default:
+   data = nfs4_convert_mount_opts((char *) data);
+   if (IS_ERR(data))
+   return PTR_ERR(data);
+   *options = data;
+
+   memcpy(addr, data-host_addr, sizeof(*addr));
+   if (!nfs_verify_server_address((struct sockaddr *) addr,
+   data-host_addrlen))
+   return -EINVAL;
+
+   switch (data-auth_flavourlen) {
+   case 0:
+   *authflavour = RPC_AUTH_UNIX;
+   break;
+   case 1:
+   *authflavour = (rpc_authflavor_t) 
data-auth_flavours[0];
+   break;
+   default:
+   goto out_inval_auth;
+   }
+
+   memset(ip_addr, '\0', data-client_addr.len + 1);
+   strncpy(ip_addr, data-client_addr.data, data-client_addr.len);
+
+   /*
+* Split dev_name into hostname:mntpath.
+*/
+   c = strchr(dev_name, ':');
+   if (c == NULL)
+   return -EINVAL;
+   /* while calculating len, pretend ':' is '\0' */
+   len = c - dev_name;
+   if (len  256)
+   return -EINVAL;
+   *hostname = kzalloc(len, GFP_KERNEL);
+   if (*hostname == NULL)
+   return -ENOMEM;
+   strncpy(*hostname, dev_name, len - 1);
+
+   c++;/* step over the ':' */
+   len = strlen(c);
+   if (len  1023) {
+   kfree(*hostname);
+   return -EINVAL;
+   }
+   *mntpath = kzalloc(len + 1, GFP_KERNEL);
+   if (*mntpath == NULL) {
+   kfree(*hostname);
+   return -ENOMEM;
+   }
+   strncpy(*mntpath, c, len);
+
+   dprintk(MNTPATH: %s\n, *mntpath

[PATCH 02/13] SUNRPC: Rename rpcb_getport_external routine

2007-05-21 Thread Chuck Lever
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/nfsroot.c|2 +-
 include/linux/sunrpc/clnt.h |7 ++-
 net/sunrpc/rpcb_clnt.c  |   21 +++--
 3 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfsroot.c b/fs/nfs/nfsroot.c
index 49d1008..f0db470 100644
--- a/fs/nfs/nfsroot.c
+++ b/fs/nfs/nfsroot.c
@@ -428,7 +428,7 @@ static int __init root_nfs_getport(int program, int 
version, int proto)
printk(KERN_NOTICE Looking up port of RPC %d/%d on %u.%u.%u.%u\n,
program, version, NIPQUAD(servaddr));
set_sockaddr(sin, servaddr, 0);
-   return rpcb_getport_external(sin, program, version, proto);
+   return rpcb_getport_sync(sin, program, version, proto);
 }
 
 
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 6661142..c51bc8c 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -122,8 +122,10 @@ struct rpc_clnt *rpc_clone_client(struct rpc_clnt *);
 intrpc_shutdown_client(struct rpc_clnt *);
 intrpc_destroy_client(struct rpc_clnt *);
 void   rpc_release_client(struct rpc_clnt *);
+
 intrpcb_register(u32, u32, int, unsigned short, int *);
 void   rpcb_getport(struct rpc_task *);
+intrpcb_getport_sync(struct sockaddr_in *, __u32, __u32, int);
 
 void   rpc_call_setup(struct rpc_task *, struct rpc_message *, int);
 
@@ -142,10 +144,5 @@ intrpc_ping(struct rpc_clnt *clnt, int 
flags);
 size_t rpc_peeraddr(struct rpc_clnt *, struct sockaddr *, size_t);
 char * rpc_peeraddr2str(struct rpc_clnt *, enum rpc_display_format_t);
 
-/*
- * Helper function for NFSroot support
- */
-intrpcb_getport_external(struct sockaddr_in *, __u32, __u32, int);
-
 #endif /* __KERNEL__ */
 #endif /* _LINUX_SUNRPC_CLNT_H */
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 6c7aa8a..5a52604 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -12,6 +12,8 @@
  *  Copyright (C) 1996, Olaf Kirch [EMAIL PROTECTED]
  */
 
+#include linux/module.h
+
 #include linux/types.h
 #include linux/socket.h
 #include linux/kernel.h
@@ -246,21 +248,20 @@ int rpcb_register(u32 prog, u32 vers, int prot, unsigned 
short port, int *okay)
return error;
 }
 
-#ifdef CONFIG_ROOT_NFS
 /**
- * rpcb_getport_external - obtain the port for an RPC service on a given host
+ * rpcb_getport_sync - obtain the port for an RPC service on a given host
  * @sin: address of remote peer
  * @prog: RPC program number to bind
  * @vers: RPC version number to bind
  * @prot: transport protocol to use to make this request
  *
  * Called from outside the RPC client in a synchronous task context.
+ * Uses default timeout parameters specified by underlying transport.
  *
- * For now, this supports only version 2 queries, but is used only by
- * mount_clnt for NFS_ROOT.
+ * XXX: Needs to support IPv6, and rpcbind versions 3 and 4
  */
-int rpcb_getport_external(struct sockaddr_in *sin, __u32 prog,
-   __u32 vers, int prot)
+int rpcb_getport_sync(struct sockaddr_in *sin, __u32 prog,
+ __u32 vers, int prot)
 {
struct rpcbind_args map = {
.r_prog = prog,
@@ -277,10 +278,10 @@ int rpcb_getport_external(struct sockaddr_in *sin, __u32 
prog,
char hostname[40];
int status;
 
-   dprintk(RPC:   rpcb_getport_external(%u.%u.%u.%u, %u, %u, %d)\n,
-   NIPQUAD(sin-sin_addr.s_addr), prog, vers, prot);
+   dprintk(RPC:   %s( NIPQUAD_FMT , %u, %u, %d)\n,
+   __FUNCTION__, NIPQUAD(sin-sin_addr.s_addr), prog, vers, prot);
 
-   sprintf(hostname, %u.%u.%u.%u, NIPQUAD(sin-sin_addr.s_addr));
+   sprintf(hostname, NIPQUAD_FMT, NIPQUAD(sin-sin_addr.s_addr));
rpcb_clnt = rpcb_create(hostname, (struct sockaddr *)sin, prot, 2, 0);
if (IS_ERR(rpcb_clnt))
return PTR_ERR(rpcb_clnt);
@@ -294,7 +295,7 @@ int rpcb_getport_external(struct sockaddr_in *sin, __u32 
prog,
}
return status;
 }
-#endif
+EXPORT_SYMBOL_GPL(rpcb_getport_sync);
 
 /**
  * rpcb_getport - obtain the port for a given RPC service on a given host

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] NFS: Move nfs_copy_user_string

2007-05-21 Thread Chuck Lever
Next patch will add a new function that calls nfs_copy_user_string.

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
---

 fs/nfs/super.c |   42 +-
 1 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 927c1c2..8585fa5 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1622,6 +1622,27 @@ out_err:
return ERR_PTR(-EINVAL);
 }
 
+static void *nfs_copy_user_string(char *dst, struct nfs_string *src, int 
maxlen)
+{
+   void *p = NULL;
+
+   if (!src-len)
+   return ERR_PTR(-EINVAL);
+   if (src-len  maxlen)
+   maxlen = src-len;
+   if (dst == NULL) {
+   p = dst = kmalloc(maxlen + 1, GFP_KERNEL);
+   if (p == NULL)
+   return ERR_PTR(-ENOMEM);
+   }
+   if (copy_from_user(dst, src-data, maxlen)) {
+   kfree(p);
+   return ERR_PTR(-EFAULT);
+   }
+   dst[maxlen] = '\0';
+   return dst;
+}
+
 /*
  * Finish setting up a cloned NFS4 superblock
  */
@@ -1646,27 +1667,6 @@ static void nfs4_fill_super(struct super_block *sb)
nfs_initialise_sb(sb);
 }
 
-static void *nfs_copy_user_string(char *dst, struct nfs_string *src, int 
maxlen)
-{
-   void *p = NULL;
-
-   if (!src-len)
-   return ERR_PTR(-EINVAL);
-   if (src-len  maxlen)
-   maxlen = src-len;
-   if (dst == NULL) {
-   p = dst = kmalloc(maxlen + 1, GFP_KERNEL);
-   if (p == NULL)
-   return ERR_PTR(-ENOMEM);
-   }
-   if (copy_from_user(dst, src-data, maxlen)) {
-   kfree(p);
-   return ERR_PTR(-EFAULT);
-   }
-   dst[maxlen] = '\0';
-   return dst;
-}
-
 /*
  * Get the superblock for an NFS4 mountpoint
  */

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/13] NFS: Add functions to parse nfs mount options to fs/nfs/super.c

2007-05-29 Thread Chuck Lever

Karel Zak wrote:

On Mon, May 21, 2007 at 12:09:54PM -0400, Chuck Lever wrote:

For NFSv2 and NFSv3 mount options.
Signed-off-by: Chuck Lever [EMAIL PROTECTED]


 


+static int nfs_parse_options(char *raw, struct nfs_mount_args *mnt)
+{
+   char *p, *string;
+
+   if (!raw) {
+   dprintk(NFS: mount options string was NULL.\n);
+   return 1;
+   }
+
+   while ((p = strsep (raw, ,)) != NULL) {
+   substring_t args[MAX_OPT_ARGS];
+   int option, token;
+
+   if (!*p)
+   continue;
+   token = match_token(p, nfs_tokens, args);


 


+
+   case Opt_context:
+   match_strcpy(mnt-nmd.context, args);
+   break;


 The userspace version (nfs-utils) of this code supports a quoted
 context strings. For example:

context=aaa,bbb,ccc,hard

 It seems your code blindly parses a raw option string by ,.


Karel-

I've never used the context= option, and didn't find any documentation 
describing how it was used.


Is there a clean example of how to use the in-kernel parser to handle 
quoted strings containing commas?
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-14 Thread Chuck Lever

Hi Chris-

John Stoffel wrote:

As a user of Netapps, having quotas (if only for reporting purposes)
and some way to migrate non-used files to slower/cheaper storage would
be great.

Ie. being able to setup two pools, one being RAID6, the other being
RAID1, where all currently accessed files are in the RAID1 setup, but
if un-used get migrated to the RAID6 area.  


And of course some way for efficient backups and more importantly
RESTORES of data which is segregated like this.  


I like the way dump and restore was handled in AFS (and now ZFS and 
NetApp).  There is a simple command to flatten a file system and send it 
to another system, which can receive it and re-expand it.  The 
dump/restore process uses snapshots and can easily send incremental 
backups which are significantly smaller than 0-level.  This is somewhat 
better than rsync, because you don't need checksums to discover what 
data has changed -- you already have the new data segregated into 
copied-on-write blocks.


NetApp happens to use the standard NDMP protocol for sending the 
flattened file system.  NetApp uses it for synchronous replication, 
volume migration, and back up to nearline storage and tape.  AFS used 
vol dump and vol restore for migration, replication, and back-up. 
ZFS has the zfs send and zfs receive commands that do basically the 
same (Eric Kustarz recently published a blog entry that described how 
these work).  And of course, all file system objects are able to be sent 
this way:  streams, xattrs, ACLs, and so on are all supported.


Note also that NFSv4 supports the idea of migrated or replicated file 
objects.  All that is needed to support it is a mechanism on the servers 
to actually move the data.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: [ANNOUNCE] Btrfs: a copy on write, snapshotting FS

2007-06-15 Thread Chuck Lever

Chris Mason wrote:

On Thu, Jun 14, 2007 at 02:20:26PM -0400, Chuck Lever wrote:
NetApp happens to use the standard NDMP protocol for sending the 
flattened file system.  NetApp uses it for synchronous replication, 
volume migration, and back up to nearline storage and tape.  AFS used 
vol dump and vol restore for migration, replication, and back-up. 
ZFS has the zfs send and zfs receive commands that do basically the 
same (Eric Kustarz recently published a blog entry that described how 
these work).  And of course, all file system objects are able to be sent 
this way:  streams, xattrs, ACLs, and so on are all supported.


Note also that NFSv4 supports the idea of migrated or replicated file 
objects.  All that is needed to support it is a mechanism on the servers 
to actually move the data.


Stringing the replication together with the underlying FS would be neat.
Is there a way to deal with a master/slave setup, where the slave may be
out of date?


Among the implementations I'm aware of, there is a varying degree of 
integration into the physical file system.  In general, it depends on 
how far out of date the slave is, and how closely the slave is supposed 
to be synchronized to the master.


A hot backup file system, for example, should be data-consistent within 
a few seconds of the master.  A snapshot is used to initialize a slave, 
followed by a live stream of updates to the master being sent to slaves. 
 Such a mechanism already exists on NetApp filers because they gather 
changes in NVRAM before committing them to the local file system. 
Simply put, these changes can also be bundled and sent to a local hot 
backup filer that is attached via Infiniband, or over the network to a 
remote hot backup filer.


For AFS, replication is done by maintaining a rw and ro copy of a volume 
on the designated master server.  Changes are made to the rw copy over 
time.  When admins want to push out a new version to replicas on another 
server, the ro copy on the master is replaced with a new snapshot, then 
this is pushed to the slaves.  The replicas are always ro and are used 
mostly for load balancing; clients contact the closest or fastest server 
containing a replica of the volume they want to access.  They always 
have a complete copy of the volume (ie no COW on the slaves).


I think you have designed into btrfs a lot of opportunity to implement 
this kind of data virtualization and management... I'm excited to see 
what can be done.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-20 Thread Chuck Lever

Al Viro wrote:

On Wed, Jun 20, 2007 at 01:57:33PM -0700, H. Peter Anvin wrote:

... or, alternatively, add a subfield to the first field (which would
entail escaping whatever separator we choose):

/dev/md6 /export ext3 rw,data=ordered 0 0
/dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
/dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0


Hell, no.  The first field is in principle impossible to parse unless
you know the fs type.

How about making a new file with sane format?  From the very
beginning.  E.g. mountpoint + ID + relative path + type + options,
where ID uniquely identifies superblock (e.g. numeric st_dev)
and backing device (if any) is sitting among the options...


To support NFS client performance statistics, I recently added 
/proc/self/mountstats.  That might be a place to add details about 
--move and --bind mounts without changing the format of /proc/mounts.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-20 Thread Chuck Lever

H. Peter Anvin wrote:

Chuck Lever wrote:

To support NFS client performance statistics, I recently added
/proc/self/mountstats.  That might be a place to add details about
--move and --bind mounts without changing the format of /proc/mounts.


I just looked at /proc/self/mountstats; it seems to have no more
information than /proc/self/mounts, but in an even more annoying format.
 Either I'm missing something, this file doesn't add anything at all.


The advantage is that it doesn't have strong user space dependencies on 
its format like /proc/mounts does.


If you have NFS mount points, you will see that it includes a great deal 
of additional information about each mount.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-20 Thread Chuck Lever

H. Peter Anvin wrote:

Chuck Lever wrote:

The advantage is that it doesn't have strong user space dependencies on
its format like /proc/mounts does.

If you have NFS mount points, you will see that it includes a great deal
of additional information about each mount.


OK, I see now:
device raidtest:/export mounted on /net/raidtest/export with fstype nfs
statvers=1.0
opts:
rw,vers=3,rsize=131072,wsize=131072,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys
age:5
caps:   caps=0x9,wtmult=4096,dtsize=4096,bsize=0,namelen=255
sec:flavor=1,pseudoflavor=1
events: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bytes:  0 0 0 0 0 0 0 0
RPC iostats version: 1.0  p/v: 13/3 (nfs)
xprt:   tcp 686 0 2 0 5 8 8 0 8 0
per-op statistics
NULL: 0 0 0 0 0 0 0 0
 GETATTR: 2 2 0 264 224 1 0 1
 SETATTR: 0 0 0 0 0 0 0 0
  LOOKUP: 0 0 0 0 0 0 0 0
  ACCESS: 1 1 0 116 120 0 0 0
READLINK: 0 0 0 0 0 0 0 0
READ: 0 0 0 0 0 0 0 0
   WRITE: 0 0 0 0 0 0 0 0
  CREATE: 0 0 0 0 0 0 0 0
   MKDIR: 0 0 0 0 0 0 0 0
 SYMLINK: 0 0 0 0 0 0 0 0
   MKNOD: 0 0 0 0 0 0 0 0
  REMOVE: 0 0 0 0 0 0 0 0
   RMDIR: 0 0 0 0 0 0 0 0
  RENAME: 0 0 0 0 0 0 0 0
LINK: 0 0 0 0 0 0 0 0
 READDIR: 0 0 0 0 0 0 0 0
 READDIRPLUS: 0 0 0 0 0 0 0 0
  FSSTAT: 1 1 0 132 84 0 1 1
  FSINFO: 1 1 0 132 80 0 0 0
PATHCONF: 0 0 0 0 0 0 0 0
  COMMIT: 0 0 0 0 0 0 0 0

This format is just awful for parsing.  It's pretty clearly totally
ad-hoc.  It's not even self-consistent (it uses different separators,
etc, in the same file!)  It's reasonably compact for human consumption,
but it doesn't show what the arrays mean.

Heck, XML would have been better than this mess...


Sigh.  So where where you when I asked for review time and again?

I have a couple of simple Python scripts that can parse this without any 
difficulty.


I resent your tone.  Quite a bit.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel/
version:2.1
end:vcard



Re: request for patches: showing mount options

2007-07-27 Thread Chuck Lever

Miklos:

Some mount options are never passed to the kernel, and thus can't appear 
in /proc/mounts.  Examples include user, users, and _netdev for NFS.


Miklos Szeredi wrote:

[please consider pruning the CC list if discussing some aspect, which
doesn't concern all]

I've done an audit of all filesystems with regards to showing mount
options in /proc/pid/mounts.  Unfortunately most of them show none
or only a part of all accepted options (for details see list of
filesystems at the end of the mail).

This is currently not a big problem, because mount(8) stores the given
options in /etc/mtab.  However we want to get rid of mtab, and this
requires, that the option showing be fixed up.

It would be easiest if this was done by the VFS instead of having to
deal with it in filesystems.  However there are differences in how
filesytems handle options during mount and remount, and it would be
impossible to take this into account in all cases.

If you are CC-ed, and responsible for one of these filesystems, please
take a moment to fully implement the -show_options() method.  In most
cases it should be an easy task.

If for some reason you are unable to do this, please let me know and
I'll fix it up.

Here are some guidelines for showing options.  I'll also add these to
Documentation/filesystems/vfs.txt

+   If a filesystem accepts mount options, it must define show_options()
+   to show all the currently active options.  The rules are:
+
+ - options MUST be shown which are not default or their values differ
+   from the default
+
+ - options MAY be shown which are enabled by default or have their
+   default value
+
+   Options used only internally between a mount helper and the kernel
+   (such as file descriptors), or which only have an effect during the
+   mounting (such as ones controlling the creation of a journal) are exempt
+   from the above rules.

Thanks,
Miklos

---
legend:

  all - fs has options, but doesn't define -show_options()
  some - fs defines -show_options(), but some options are not shown
  noopt - fs does not have options
  good - fs shows all options
  patch - I have a patch


9p  some
adfsall (maintainer?)
affsall
afs all
autofs  all
autofs4 some
befsall
bfs noopt
cifssome (odd parser)
codanoopt
configfsnoopt
cramfs  noopt
debugfs noopt
devpts  patch
ecryptfssome
efs noopt
ext2patch
ext3patch
ext4patch
fat some
freevxfsnoopt
fusepatch
gfs2good
hfs good
hfsplus good
hostfs  patch
hpfsall
hppfs   noopt
hugetlbfs   all
isofs   all (maintainer?)
jffs2   noopt
jfs some
minix   noopt
msdos   -fat
ncpfs   all (FS_BINARY_MOUNTDATA?)
nfs some
nfsdnoopt
ntfsgood (odd parser)
ocfs2   all
openpromfs  noopt
procnoopt
qnx4noopt
ramfs   noopt
reiserfsall
romfs   noopt
smbfs   good (odd parser) (maintainer?)
sysfs   noopt
sysvnoopt
udf all
ufs all
vfat-fat
xfs some (odd parser)

mm/shmem.cpatch
drivers/oprofile/oprofilefs.c noopt
drivers/infiniband/hw/ipath/ipath_fs.cnoopt
drivers/misc/ibmasm/ibmasmfs.cnoopt
drivers/usb/core (usbfs)  noopt
drivers/usb/gadget (gadgetfs) noopt
drivers/isdn/capi/capifs.cnoopt
kernel/cpuset.c   noopt
fs/binfmt_misc.c  noopt
net/sunrpc/rpc_pipe.c noopt
arch/powerpc/platforms/cell/spufs all
arch/s390/hypfs   all
ipc/mqueue.c  noopt
security (securityfs) noopt
security/selinux/selinuxfs.c  noopt

in -mm:

reiser4some (odd parser)
kernel/container.c good (odd parser)
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: request for patches: showing mount options

2007-07-27 Thread Chuck Lever

Miklos Szeredi wrote:
Some mount options are never passed to the kernel, and thus can't appear 
in /proc/mounts.  Examples include user, users, and _netdev for NFS.


These options control *who* may mount and *when* to mount.  They are
not a property of the mount itself and are not added to /etc/mtab.

There's a user=ID option that is added to /etc/mtab in case of user
mounts.  This identifies the owner of the mount, so that it can be
unmounted by that user.  There are patches in -mm that enable the
kernel to store this info.

Do you have other examples in mind?


[no]quota comes to mind; also auto, [no]owner, [no]group, and 
quiet/loud, but these may fall into the same category you mention above.


Aside: It's a confusing artifact of the mount CLI that these options 
control who/when but are passed to the mount command in the same way the 
other options are.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: Correct behavior on O_DIRECT sparse file writes

2007-10-15 Thread Chuck Lever

Florian Weimer wrote:

* Andrew Morton:


I don't think it's a bug.  Sure, O_DIRECT is synchronous, but that's
because it is, err, direct.  Not because it provides extra data-integrity
guarantees.  If you want those guarantees, use O_SYNC as well.


This needs to be prominently documented.  Right now, it's far from clear
that you need both O_DIRECT and O_SYNC.


It's certainly not a requirement for NFS.  O_DIRECT on NFS forces data 
to the server, which always updates a file's metadata on each write, 
including indirect blocks.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: Beagle and logging inotify events

2007-11-14 Thread Chuck Lever

On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
Is it feasible to do something like this in the linux file system  
architecture?


Beagle beats on my disk for an hour when I reboot. Of course I don't
like that and I shut Beagle off.


Leopard, by the way, does exactly this: it has a daemon that starts  
at boot time and taps FSEvents then journals file system changes to a  
well-known file on local disk.


I don't see why this couldn't be done on Linux as well.


-- Forwarded message --
From: Jon Smirl [EMAIL PROTECTED]
Date: Nov 13, 2007 4:44 PM
Subject: Re: Strange beagle interaction..
To: Linus Torvalds [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED], Junio C Hamano
[EMAIL PROTECTED], Git Mailing List [EMAIL PROTECTED], Johannes
Schindelin [EMAIL PROTECTED]


On 11/13/07, Linus Torvalds [EMAIL PROTECTED] wrote:



On Tue, 13 Nov 2007, J. Bruce Fields wrote:


Last I ran across this, I believe I found it was adding extended
attributes to the file.


Yeah, I just straced it and found the same thing. It's saving  
fingerprints

and mtimes to files in the extended attributes.


Things like Beagle need a guaranteed log of global inotify events.
That would let them efficiently find changes made since the last time
they updated their index.

Right now every time Beagle starts it hasn't got a clue what has
changed in the file system since it was last run. This forces Beagle
to rescan the entire filesystem every time it is started. The xattrs
are used as cache to reduce this load somewhat.

A better solution would be for the kernel to log inotify events to
disk in a manner that survives reboots. When Beagle starts it would
locate its last checkpoint and then process the logged inotify events
from that time forward. This inotify logging needs to be bullet proof
or it will mess up your Beagle index.

Logged files systems already contain the logged inotify data (in their
own internal form). There's just no universal API for retrieving it in
a file system independent manner.




Yeah, I just turned off beagle.  It looked to me like it was doing
something wrongheaded.


Gaah. The problem is, setting xattrs does actually change ctime.  
Which
means that if we want to make git play nice with beagle, I guess  
we have

to just remove the comparison of ctime.

Oh, well. Git doesn't *require* it, but I like the notion of  
checking the
inode really really carefully. But it looks like it may not be an  
option,

because of file indexers hiding stuff behind our backs.

Or we could just tell people not to run beagle on their git trees,  
but I
suspect some people will actually *want* to. Even if it flushes  
their disk

caches.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Jon Smirl
[EMAIL PROTECTED]


--
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux- 
fsdevel in

the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Beagle and logging inotify events

2007-11-14 Thread Chuck Lever

Jon Smirl wrote:

On 11/14/07, Chuck Lever [EMAIL PROTECTED] wrote:

On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:

Is it feasible to do something like this in the linux file system
architecture?

Beagle beats on my disk for an hour when I reboot. Of course I don't
like that and I shut Beagle off.

Leopard, by the way, does exactly this: it has a daemon that starts
at boot time and taps FSEvents then journals file system changes to a
well-known file on local disk.


Logging file systems have all of the needed info. Plus they know what
is going on with rollback/replay after a crash.


True, but not all file systems have a journal.  Consider ext2 or FAT32, 
both of which are still common.



How about a fs API
where Beagle has a token for a checkpoint, and then it can ask for a
recreation of inotify events from that point forward.  It's always
possible for the file system to say I can't do that and trigger a full
rebuild from Beagle. Daemons that aren't coordinated with the file
system have a window during crash/reboot where they can get confused.


A reasonably effective solution can be implemented in user space without 
changes to the file system APIs or implementations.  IOW we already have 
the tools to make something useful.


For example, you don't need to record every file system event to make 
this useful.  Listing only directory-level changes (ie some file in 
this directory has changed) is enough to prune most of Beagle's work 
when it starts up.



Without low level support like this Beagle is forced to do a rescan on
every boot. Since I crash my machine all of the time the disk load
from rebooting is intolerable and I turn Beagle off. Even just turning
the machine on in the morning generates an annoyingly large load on
the disk.


Understood.  The need is clear.

My Dad's WinXP system takes 10 minutes after every start-up before it's 
usable, simply because the virus scanner has to check every file in the 
system.  Same problem!



I don't see why this couldn't be done on Linux as well.


-- Forwarded message --
From: Jon Smirl [EMAIL PROTECTED]
Date: Nov 13, 2007 4:44 PM
Subject: Re: Strange beagle interaction..
To: Linus Torvalds [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED], Junio C Hamano
[EMAIL PROTECTED], Git Mailing List [EMAIL PROTECTED], Johannes
Schindelin [EMAIL PROTECTED]


On 11/13/07, Linus Torvalds [EMAIL PROTECTED] wrote:


On Tue, 13 Nov 2007, J. Bruce Fields wrote:

Last I ran across this, I believe I found it was adding extended
attributes to the file.

Yeah, I just straced it and found the same thing. It's saving
fingerprints
and mtimes to files in the extended attributes.

Things like Beagle need a guaranteed log of global inotify events.
That would let them efficiently find changes made since the last time
they updated their index.

Right now every time Beagle starts it hasn't got a clue what has
changed in the file system since it was last run. This forces Beagle
to rescan the entire filesystem every time it is started. The xattrs
are used as cache to reduce this load somewhat.

A better solution would be for the kernel to log inotify events to
disk in a manner that survives reboots. When Beagle starts it would
locate its last checkpoint and then process the logged inotify events
from that time forward. This inotify logging needs to be bullet proof
or it will mess up your Beagle index.

Logged files systems already contain the logged inotify data (in their
own internal form). There's just no universal API for retrieving it in
a file system independent manner.


Yeah, I just turned off beagle.  It looked to me like it was doing
something wrongheaded.

Gaah. The problem is, setting xattrs does actually change ctime.
Which
means that if we want to make git play nice with beagle, I guess
we have
to just remove the comparison of ctime.

Oh, well. Git doesn't *require* it, but I like the notion of
checking the
inode really really carefully. But it looks like it may not be an
option,
because of file indexers hiding stuff behind our backs.

Or we could just tell people not to run beagle on their git trees,
but I
suspect some people will actually *want* to. Even if it flushes
their disk
caches.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Jon Smirl
[EMAIL PROTECTED]


--
Jon Smirl
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-
fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com







begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: Beagle and logging inotify events

2007-11-14 Thread Chuck Lever

Jon Smirl wrote:

On 11/14/07, Chuck Lever [EMAIL PROTECTED] wrote:

Jon Smirl wrote:

On 11/14/07, Chuck Lever [EMAIL PROTECTED] wrote:

On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:

Is it feasible to do something like this in the linux file system
architecture?

Beagle beats on my disk for an hour when I reboot. Of course I don't
like that and I shut Beagle off.

Leopard, by the way, does exactly this: it has a daemon that starts
at boot time and taps FSEvents then journals file system changes to a
well-known file on local disk.

Logging file systems have all of the needed info. Plus they know what
is going on with rollback/replay after a crash.

True, but not all file systems have a journal.  Consider ext2 or FAT32,
both of which are still common.


ext2/FAT32 can use the deamon approach you describe below which also
works as a short term solution. The Beagle people do have a deamon but
it can be turned off. Holes where you don't record the inotify events
and update the index are really bad because they can make files that
you know are on the disk disappear from the index.  I don't believe
Beagle distinguishes between someone turning it off for a day and then
turning it back on, vs a reboot. In both cases it says there was a
window where untracked changes could have happened and it triggers a
full rescan.

The root problem here is needing a bullet proof inotify log with no
windows.


I disagree: we don't need a bullet-proof log.  We can get a 
significant performance improvement even with a permanent dnotify log 
implemented in user-space.  We already have well-defined fallback 
behavior if such a log is missing or incomplete.


The problem with a permanent inotify log is that it can become 
unmanageably enormous, and a performance problem to boot.  Recording at 
that level of detail makes it more likely that the logger won't be able 
to keep up with file system activity.


A lightweight solution gets us most of the way there, is simple to 
implement, and doesn't introduce many new issues.  As long as it can 
tell us precisely where the holes are, it shouldn't be a problem.



The only place that is going to happen is inside the file
system logs.


As Andi points out, existing block-based journaling implementations 
won't easily provide this.  And most fs journals are actually pretty 
limited in size.


Alternately, you could insert a stackable file system layer between the 
VFS and the on-disk fs to provide more seamless information about updates.



We just need an API to say recreate the inotify stream
from this checkpoint forward. Things like FAT/ext2 will always return
a no data available error from this API.


How about a fs API
where Beagle has a token for a checkpoint, and then it can ask for a
recreation of inotify events from that point forward.  It's always
possible for the file system to say I can't do that and trigger a full
rebuild from Beagle. Daemons that aren't coordinated with the file
system have a window during crash/reboot where they can get confused.

A reasonably effective solution can be implemented in user space without
changes to the file system APIs or implementations.  IOW we already have
the tools to make something useful.

For example, you don't need to record every file system event to make
this useful.  Listing only directory-level changes (ie some file in
this directory has changed) is enough to prune most of Beagle's work
when it starts up.


Without low level support like this Beagle is forced to do a rescan on
every boot. Since I crash my machine all of the time the disk load
from rebooting is intolerable and I turn Beagle off. Even just turning
the machine on in the morning generates an annoyingly large load on
the disk.

Understood.  The need is clear.

My Dad's WinXP system takes 10 minutes after every start-up before it's
usable, simply because the virus scanner has to check every file in the
system.  Same problem!


I don't see why this couldn't be done on Linux as well.


-- Forwarded message --
From: Jon Smirl [EMAIL PROTECTED]
Date: Nov 13, 2007 4:44 PM
Subject: Re: Strange beagle interaction..
To: Linus Torvalds [EMAIL PROTECTED]
Cc: J. Bruce Fields [EMAIL PROTECTED], Junio C Hamano
[EMAIL PROTECTED], Git Mailing List [EMAIL PROTECTED], Johannes
Schindelin [EMAIL PROTECTED]


On 11/13/07, Linus Torvalds [EMAIL PROTECTED] wrote:

On Tue, 13 Nov 2007, J. Bruce Fields wrote:

Last I ran across this, I believe I found it was adding extended
attributes to the file.

Yeah, I just straced it and found the same thing. It's saving
fingerprints
and mtimes to files in the extended attributes.

Things like Beagle need a guaranteed log of global inotify events.
That would let them efficiently find changes made since the last time
they updated their index.

Right now every time Beagle starts it hasn't got a clue what has
changed in the file system since it was last run. This forces Beagle
to rescan the entire filesystem every time it is started

Re: [patch] VFS: extend /proc/mounts

2008-01-17 Thread Chuck Lever

On Jan 17, 2008, at 3:55 AM, Miklos Szeredi wrote:
Hey, I just found /proc/X/mountstats.  How does this fit in to the  
big

picture?


It seems to show some counters for NFS mounts, no other filesystem
uses it.  Format looks rather less nice, than /proc/X/mounts (why do
we need long english sentences under /proc?).



I introduced /proc/self/mountstats because we need a way for non- 
block-device-based file systems to report I/O statistics.  Everything  
else I tried was rejected, and apparently what we ended up with was  
reviewed by only a handful of people, so no one else likes it or uses  
it.


It can go away for all I care, as long as we retain some flexible  
mechanism for non-block-based file systems to report I/O stats.  As  
far as I am aware, there are only two user utilities that understand  
and parse this data, and I maintain both.


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] enhanced ESTALE error handling

2008-01-18 Thread Chuck Lever

Hi Peter-

On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote:

Hi.

Here is a patch set which modifies the system to enhance the
ESTALE error handling for system calls which take pathnames
as arguments.


The VFS already handles ESTALE.

If a pathname resolution encounters an ESTALE at any point, the  
resolution is restarted exactly once, and an additional flag is  
passed to the file system during each lookup that forces each  
component in the path to be revalidated on the server.  This has no  
possibility of causing an infinite loop.


Is there some part of this logic that is no longer working?




--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] enhanced ESTALE error handling

2008-01-18 Thread Chuck Lever

On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote:

Chuck Lever wrote:

Hi Peter-

On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote:

Hi.

Here is a patch set which modifies the system to enhance the
ESTALE error handling for system calls which take pathnames
as arguments.


The VFS already handles ESTALE.

If a pathname resolution encounters an ESTALE at any point, the  
resolution is restarted exactly once, and an additional flag is  
passed to the file system during each lookup that forces each  
component in the path to be revalidated on the server.  This has  
no possibility of causing an infinite loop.


Is there some part of this logic that is no longer working?


The VFS does not fully handle ESTALE.  An ESTALE error can occur
during the second pathname resolution attempt.


If an ESTALE occurs during the second resolution attempt, we should  
give up.  When I addressed this issue two years ago, the two-try  
logic was the only acceptable solution because there's no way to  
guarantee the pathname resolution will ever finish unless we put a  
hard limit on it.



There are lots of
reasons, some of which are the 1 second resolution from some file
systems on the server


Which is a server bug, AFAICS.  It's simply impossible to close all  
the windows that result from sloppy file time stamps without  
completely disabling client-side caching.  The NFS protocol relies on  
file time stamps to manage cache coherence.  If the server is lying  
about time stamps, there's no way the client can cache coherently.



and the window in between the revalidation
and the actual use of the file handle associated with each
dentry/inode pair.


A use case or two would be useful to explore (on linux-nfs or linux- 
fsdevel, rather than lkml).



Also, there was no support for ESTALE errors which occur during
subsequent operations to the pathname resolution process.  For
example, during a mkdir(2) operation, the ESTALE can occur from
the over the wire MKDIR operation after the LOOKUP operations
have all succeeded.


If the final operation fails after a pathname resolution, then it's a  
real error.  Is there a fixed and valid recovery script for the  
client in this case that will allow the mkdir to proceed?


Admittedly, the NFS client could recover more cleanly from some of  
these problems, but given the architecture of the Linux VFS, it will  
be difficult to address some of the corner cases.


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] enhanced ESTALE error handling

2008-01-18 Thread Chuck Lever

On Jan 18, 2008, at 12:30 PM, Peter Staubach wrote:

Chuck Lever wrote:

On Jan 18, 2008, at 11:55 AM, Peter Staubach wrote:

Chuck Lever wrote:

Hi Peter-

On Jan 18, 2008, at 10:35 AM, Peter Staubach wrote:

Hi.

Here is a patch set which modifies the system to enhance the
ESTALE error handling for system calls which take pathnames
as arguments.


The VFS already handles ESTALE.

If a pathname resolution encounters an ESTALE at any point, the  
resolution is restarted exactly once, and an additional flag is  
passed to the file system during each lookup that forces each  
component in the path to be revalidated on the server.  This has  
no possibility of causing an infinite loop.


Is there some part of this logic that is no longer working?


The VFS does not fully handle ESTALE.  An ESTALE error can occur
during the second pathname resolution attempt.


If an ESTALE occurs during the second resolution attempt, we  
should give up.  When I addressed this issue two years ago, the  
two-try logic was the only acceptable solution because there's no  
way to guarantee the pathname resolution will ever finish unless  
we put a hard limit on it.




I can probably imagine a situation where the pathname resolution
would never finish, but I am not sure that it could ever happen
in nature.


Unless someone is doing something malicious.  Or if the server is  
repeatedly returning ESTALE for some reason.



There are lots of
reasons, some of which are the 1 second resolution from some file
systems on the server


Which is a server bug, AFAICS.  It's simply impossible to close  
all the windows that result from sloppy file time stamps without  
completely disabling client-side caching.  The NFS protocol relies  
on file time stamps to manage cache coherence.  If the server is  
lying about time stamps, there's no way the client can cache  
coherently.




Server bug or not, it is something that the client has to live
with.  We can't get the server file system fixed, so it is
something that we should find a way to live with.  This support
can help.


We haven't identified a server-side solution yet, but that doesn't  
mean it doesn't exist.


If we address the time stamp problem in the client, should we also go  
to lengths to address it in every other corner of the NFS client?   
Should we also address every other server bug we discover with a  
client side fix?



Also, there was no support for ESTALE errors which occur during
subsequent operations to the pathname resolution process.  For
example, during a mkdir(2) operation, the ESTALE can occur from
the over the wire MKDIR operation after the LOOKUP operations
have all succeeded.


If the final operation fails after a pathname resolution, then  
it's a real error.  Is there a fixed and valid recovery script for  
the client in this case that will allow the mkdir to proceed?




Why do you think that it is an error?


Because this is a problem that sometimes requires application-level  
recovery.  Can we guarantee that retrying the mkdir is the right  
thing to do every time?



It can easily occur if the directory in which the new directory
is to be created disppears after it is looked up and before the
MKDIR is issued.

The recovery is to perform the lookup again.


Have you tried this client against a file server when you unexport  
the filesystem under test?  The server returns ESTALE no matter what  
the client does.  Should the client continue to retry the request if  
the file system has been permanently taken offline?


Admittedly, the NFS client could recover more cleanly from some of  
these problems, but given the architecture of the Linux VFS, it  
will be difficult to address some of the corner cases.


Could you outline some of these corner cases that this proposal
would not address, please?


I think we have one right here: should the client retry a mkdir if  
gets an ESTALE?


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 21/26] mount options: partially fix nfs

2008-01-24 Thread Chuck Lever

Hi Miklos-

Miklos Szeredi wrote:

From: Miklos Szeredi [EMAIL PROTECTED]

Add posix, bsize=, namelen= options to /proc/mounts for nfs
filesystems.

Document several other options that are still missing.


NFS lists only some options in /proc/mounts on purpose: only the 
essential options are mentioned there to keep clutter down.  The three 
you've added here are for all intents and purposes deprecated, which is 
why they are not supported.


NFS lists a more complete set of mount options for a mount point in 
/proc/self/mountstats.  See nfs_show_stats().


Since your cover letter does not explain why you are changing this code, 
can you refer me to a description of why you are doing this?


More below.


Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/nfs/super.c
===
--- linux.orig/fs/nfs/super.c   2008-01-19 11:56:34.0 +0100
+++ linux/fs/nfs/super.c2008-01-21 20:41:30.0 +0100
@@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
} nfs_info[] = {
{ NFS_MOUNT_SOFT, ,soft, ,hard },
{ NFS_MOUNT_INTR, ,intr, ,nointr },
+   { NFS_MOUNT_POSIX, ,posix,  },
{ NFS_MOUNT_NOCTO, ,nocto,  },
{ NFS_MOUNT_NOAC, ,noac,  },
{ NFS_MOUNT_NONLM, ,nolock,  },
@@ -459,10 +460,17 @@ static void nfs_show_mount_options(struc
};
const struct proc_nfs_info *nfs_infop;
struct nfs_client *clp = nfss-nfs_client;
+   unsigned int default_namelen =
+   clp-rpc_ops-version == 4 ? NFS4_MAXNAMLEN :
+   clp-rpc_ops-version == 3 ? NFS3_MAXNAMLEN : NFS2_MAXNAMLEN;
 
 	seq_printf(m, ,vers=%d, clp-rpc_ops-version);

seq_printf(m, ,rsize=%d, nfss-rsize);
seq_printf(m, ,wsize=%d, nfss-wsize);
+   if (nfss-bsize != 0)
+   seq_printf(m, ,bsize=%d, nfss-bsize);
+   if (nfss-namelen != default_namelen)
+   seq_printf(m, ,namelen=%d, nfss-namelen);
if (nfss-acregmin != 3*HZ || showdefaults)
seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
if (nfss-acregmax != 60*HZ || showdefaults)
@@ -482,6 +490,18 @@ static void nfs_show_mount_options(struc
seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout-to_initval 
/ HZ);
seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries);
seq_printf(m, ,sec=%s, 
nfs_pseudoflavour_to_name(nfss-client-cl_auth-au_flavor));
+
+   /*
+* Missing options:
+* port=


Probably should be supported.


+* addr=


This one is already supported; see nfs_show_options().


+* clientaddr=


This one isn't, and should be... would be useful for tracking down 
certain NFSv4 problems.



+* mounthost=
+* mountaddr=

 +   * mountport=
 +   * mountvers=
 +   * mountproto=

And these mount* options are for the kernel's new mount protocol client. 
 They aren't really useful for understanding steady-state NFS client 
behavior, they only effect mount-time behavior.
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
email;internet:chuck dot lever at nospam oracle dot com
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard



Re: [PATCH 24/27] NFS: Use local caching [try #2]

2008-01-24 Thread Chuck Lever

Some comments below.

This patch really ought to be broken into more manageable atomic changes 
to make it easier to review, and to provide more fine-grained 
explanation and rationalization for each specific change via individual 
patch descriptions.


David Howells wrote:

The attached patch makes it possible for the NFS filesystem to make use of the
network filesystem local caching service (FS-Cache).

To be able to use this, an updated mount program is required.  This can be
obtained from:

http://people.redhat.com/steved/fscache/util-linux/


This should no longer be necessary.  The latest mount.nfs subcommand 
from nfs-utils supports text-based mounts when running on kernels 2.6.23 
and later.



To mount an NFS filesystem to use caching, add an fsc option to the mount:

mount warthog:/ /a -o fsc


I hope you intend to provide updates to nfs(5) that describe the new 
mount options you introduce in this and later patches.  You don't 
mention it, but I assume that nofsc is the default behavior.



Signed-off-by: David Howells [EMAIL PROTECTED]
---

 fs/nfs/Makefile   |1 
 fs/nfs/client.c   |5 +

 fs/nfs/file.c |   37 
 fs/nfs/fscache-def.c  |  289 +
 fs/nfs/fscache.c  |  391 +
 fs/nfs/fscache.h  |  148 +
 fs/nfs/inode.c|   47 +
 fs/nfs/read.c |   28 +++
 fs/nfs/super.c|3 
 fs/nfs/sysctl.c   |1 
 include/linux/nfs_fs.h|9 +

 include/linux/nfs_fs_sb.h |   18 ++
 12 files changed, 968 insertions(+), 9 deletions(-)
 create mode 100644 fs/nfs/fscache-def.c
 create mode 100644 fs/nfs/fscache.c
 create mode 100644 fs/nfs/fscache.h


diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index df0f41e..073d04c 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -16,3 +16,4 @@ nfs-$(CONFIG_NFS_V4)  += nfs4proc.o nfs4xdr.o nfs4state.o 
nfs4renewd.o \
   nfs4namespace.o
 nfs-$(CONFIG_NFS_DIRECTIO) += direct.o
 nfs-$(CONFIG_SYSCTL) += sysctl.o
+nfs-$(CONFIG_NFS_FSCACHE) += fscache.o fscache-def.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a6f6254..bcdc5d0 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -43,6 +43,7 @@
 #include delegation.h
 #include iostat.h
 #include internal.h
+#include fscache.h
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
 
@@ -139,6 +140,8 @@ static struct nfs_client *nfs_alloc_client(const char *hostname,

clp-cl_state = 1  NFS4CLNT_LEASE_EXPIRED;
 #endif
 
+	nfs_fscache_get_client_cookie(clp);

+
return clp;
 
 error_3:

@@ -170,6 +173,8 @@ static void nfs_free_client(struct nfs_client *clp)
 
 	nfs4_shutdown_client(clp);
 
+	nfs_fscache_release_client_cookie(clp);

+
/* -EIO all pending I/O */
if (!IS_ERR(clp-cl_rpcclient))
rpc_shutdown_client(clp-cl_rpcclient);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index b3bb89f..d492cd7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -35,6 +35,7 @@
 #include delegation.h
 #include internal.h
 #include iostat.h
+#include fscache.h
 
 #define NFSDBG_FACILITY		NFSDBG_FILE
 
@@ -352,22 +353,48 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,

return status  0 ? status : copied;
 }
 
+/*

+ * Partially or wholly invalidate a page
+ * - Release the private state associated with a page if undergoing complete
+ *   page invalidation
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ */


Add comments like this in a separate clean up patch.


 static void nfs_invalidate_page(struct page *page, unsigned long offset)
 {
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
nfs_wb_page_cancel(page-mapping-host, page);
+
+   nfs_fscache_invalidate_page(page, page-mapping-host);
 }
 
+/*

+ * Release the private state associated with a page
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return true (may release) or false (may not)
+ */
 static int nfs_release_page(struct page *page, gfp_t gfp)
 {
/* If PagePrivate() is set, then the page is not freeable */
-   return 0;
+   if (PagePrivate(page))
+   return 0;
+   return nfs_fscache_release_page(page, gfp);
 }
 
+/*

+ * Attempt to clear the private state associated with a page when an error
+ * occurs that requires the cached contents of an inode to be written back or
+ * destroyed
+ * - Called if either PG_private or PG_fscache set on the page
+ * - Caller holds page lock
+ * - Return 0 if successful, -error otherwise
+ */
 static int nfs_launder_page(struct page *page)
 {
+   wait_on_page_fscache_write(page);
return nfs_wb_page(page-mapping-host, page);
 }
 
@@ -387,6 +414,11 @@ const struct address_space_operations nfs_file_aops = {

.launder_page = 

Re: [patch 21/26] mount options: partially fix nfs

2008-01-25 Thread Chuck Lever

On Jan 25, 2008, at 4:39 AM, Miklos Szeredi wrote:

Miklos Szeredi wrote:

From: Miklos Szeredi [EMAIL PROTECTED]

Add posix, bsize=, namelen= options to /proc/mounts for nfs
filesystems.

Document several other options that are still missing.


NFS lists only some options in /proc/mounts on purpose: only the
essential options are mentioned there to keep clutter down.  The  
three
you've added here are for all intents and purposes deprecated,  
which is

why they are not supported.

NFS lists a more complete set of mount options for a mount point in
/proc/self/mountstats.  See nfs_show_stats().

Since your cover letter does not explain why you are changing this  
code,

can you refer me to a description of why you are doing this?


Descritption is in the 01/26 patch.


More below.


Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
---

Index: linux/fs/nfs/super.c
===
--- linux.orig/fs/nfs/super.c   2008-01-19 11:56:34.0 +0100
+++ linux/fs/nfs/super.c2008-01-21 20:41:30.0 +0100
@@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
} nfs_info[] = {
{ NFS_MOUNT_SOFT, ,soft, ,hard },
{ NFS_MOUNT_INTR, ,intr, ,nointr },
+   { NFS_MOUNT_POSIX, ,posix,  },
{ NFS_MOUNT_NOCTO, ,nocto,  },
{ NFS_MOUNT_NOAC, ,noac,  },
{ NFS_MOUNT_NONLM, ,nolock,  },
@@ -459,10 +460,17 @@ static void nfs_show_mount_options(struc
};
const struct proc_nfs_info *nfs_infop;
struct nfs_client *clp = nfss-nfs_client;
+   unsigned int default_namelen =
+   clp-rpc_ops-version == 4 ? NFS4_MAXNAMLEN :
+   clp-rpc_ops-version == 3 ? NFS3_MAXNAMLEN : NFS2_MAXNAMLEN;

seq_printf(m, ,vers=%d, clp-rpc_ops-version);
seq_printf(m, ,rsize=%d, nfss-rsize);
seq_printf(m, ,wsize=%d, nfss-wsize);
+   if (nfss-bsize != 0)
+   seq_printf(m, ,bsize=%d, nfss-bsize);
+   if (nfss-namelen != default_namelen)
+   seq_printf(m, ,namelen=%d, nfss-namelen);
if (nfss-acregmin != 3*HZ || showdefaults)
seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
if (nfss-acregmax != 60*HZ || showdefaults)
@@ -482,6 +490,18 @@ static void nfs_show_mount_options(struc
 	seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout- 
to_initval / HZ);
 	seq_printf(m, ,retrans=%u, nfss-client-cl_timeout- 
to_retries);
 	seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client- 
cl_auth-au_flavor));

+
+   /*
+* Missing options:
+* port=


Probably should be supported.


+* addr=


This one is already supported; see nfs_show_options().


Right, thanks.




+* clientaddr=


This one isn't, and should be... would be useful for tracking down
certain NFSv4 problems.


+* mounthost=
+* mountaddr=
+* mountport=
+* mountvers=
+* mountproto=


And these mount* options are for the kernel's new mount protocol  
client.

  They aren't really useful for understanding steady-state NFS client
behavior, they only effect mount-time behavior.


All mount options should be shown, which are needed to reconstruct a
previous mount.


Ah, OK.

I'm happy to implement logic to display the all missing options.  I  
should have updated nfs_show_mount_options() when I wrote the NFS  
mount option parser.


Let me know your preference.


For example, if you copy options out from /proc/mount, umount the
filesystem, and then create a new mount with the copied options, you
should get the same mount.


For NFS, umount also needs to read some of the options in order to  
determine how mountd is to connect to the server for the unmount.   
(That's why we have addr= in the first place).


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 21/26] mount options: partially fix nfs

2008-01-28 Thread Chuck Lever

On Jan 28, 2008, at 6:34 AM, Miklos Szeredi wrote:

All mount options should be shown, which are needed to reconstruct a
previous mount.


Ah, OK.

I'm happy to implement logic to display the all missing options.  I
should have updated nfs_show_mount_options() when I wrote the NFS
mount option parser.

Let me know your preference.


You are more familiar with NFS, so I think it would be better if you
updated nfs_show_mount_options().

Could you also queue my patch (updated) or incorporate it into a
combined fix?


Yes.  I'll have time in a day or two to get this finished.


Thanks,
Miklos


Subject: mount options: partially fix nfs

From: Miklos Szeredi [EMAIL PROTECTED]

Add posix, bsize=, namelen= options to /proc/mounts for nfs
filesystems.

Document several other options that are still missing.

Changes:

 - display namelen= unconditionally
 - addr= isn't missing after all

Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]
CC: Trond Myklebust [EMAIL PROTECTED]
---

Index: linux/fs/nfs/super.c
===
--- linux.orig/fs/nfs/super.c   2008-01-25 15:44:56.0 +0100
+++ linux/fs/nfs/super.c2008-01-25 15:57:32.0 +0100
@@ -449,6 +449,7 @@ static void nfs_show_mount_options(struc
} nfs_info[] = {
{ NFS_MOUNT_SOFT, ,soft, ,hard },
{ NFS_MOUNT_INTR, ,intr, ,nointr },
+   { NFS_MOUNT_POSIX, ,posix,  },
{ NFS_MOUNT_NOCTO, ,nocto,  },
{ NFS_MOUNT_NOAC, ,noac,  },
{ NFS_MOUNT_NONLM, ,nolock,  },
@@ -463,6 +464,9 @@ static void nfs_show_mount_options(struc
seq_printf(m, ,vers=%d, clp-rpc_ops-version);
seq_printf(m, ,rsize=%d, nfss-rsize);
seq_printf(m, ,wsize=%d, nfss-wsize);
+   seq_printf(m, ,namelen=%d, nfss-namelen);
+   if (nfss-bsize != 0)
+   seq_printf(m, ,bsize=%d, nfss-bsize);
if (nfss-acregmin != 3*HZ || showdefaults)
seq_printf(m, ,acregmin=%d, nfss-acregmin/HZ);
if (nfss-acregmax != 60*HZ || showdefaults)
@@ -482,6 +486,17 @@ static void nfs_show_mount_options(struc
 	seq_printf(m, ,timeo=%lu, 10U * nfss-client-cl_timeout- 
to_initval / HZ);

seq_printf(m, ,retrans=%u, nfss-client-cl_timeout-to_retries);
 	seq_printf(m, ,sec=%s, nfs_pseudoflavour_to_name(nfss-client- 
cl_auth-au_flavor));

+
+   /*
+* Missing options:
+* port=
+* mountport=
+* mountvers=
+* mountproto=
+* clientaddr=
+* mounthost=
+* mountaddr=
+*/
 }

 /*


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 24/27] NFS: Use local caching [try #2]

2008-01-30 Thread Chuck Lever

Hi David-

On Jan 29, 2008, at 10:25 PM, David Howells wrote:

Chuck Lever [EMAIL PROTECTED] wrote:

This patch really ought to be broken into more manageable atomic
changes to make it easier to review, and to provide more fine-grained
explanation and rationalization for each specific change via
individual patch descriptions.


Hmmm  I broke the patch up as Trond stipulated - at least, I  
thought I

had.

In many ways this request doesn't make sense.  You can't do NFS  
caching
without all the appropriate bits, so logically they should be one  
patch.
Breaking it up won't help git-bisect since the option to enable all  
this is

the last (or nearly last) patch.


In addition to adding a new feature, you are changing existing code.   
If any one of the changes you made breaks existing behavior, having  
them all in small atomic patches makes it practical to bisect and  
find the problem.


In addition it makes it worlds easier to review by people who are not  
so familiar with your fscache implementation.  And smaller patches  
means the ratio of patch descriptions to code changes can be much  
higher.


It does make sense to introduce the files under fs/fsc in a single  
patch.  But when you are changing code that is already being used,  
more care needs to be taken.



This should no longer be necessary.  The latest mount.nfs subcommand
from nfs-utils supports text-based mounts when running on kernels
2.6.23 and later.


Okay.  I'll update my patches to reflect this.  Note, however, I've  
got

someone reporting a bug that seems to show otherwise.  I'll have to
investigate this more next week.


The very latest version (post 1.1.1) is required today for text-based  
NFS mounts.  (That is, the bleeding edge version you get by cloning  
the nfs-utils git repo).


And it only works on kernels later than 2.6.22 -- if that particular  
user is testing fscache on 2.6.22 or older, then only the legacy  
binary NFS mount system call API is supported.



Add comments like this in a separate clean up patch.





+/*
+ * Notification that a PTE pointing to an NFS page is about to be made
+ * writable, implying that someone is about to modify the page  
through a

+ * shared-writable mapping
+ */

What does that have to do with local disk caching?


+struct nfs_fh_auxdata {
+   struct timespec i_mtime;
+   struct timespec i_ctime;
+   loff_t  i_size;
+};


It might be useful to explain here why you need to supplement the
mtime, ctime, and size fields that already exist in an NFS inode.


Supplement?  I don't understand.


Why is it necessary to add additional mtime, ctime and size fields  
for NFS inodes?  Similar metadata is already stored in nfsi.


All I'm asking for is some documentation of what these fields do that  
the existing time stamps and size fields in nfsi don't.  Explain why  
the NFS fsc implementation needs this data structure.



+   key-port = clp-cl_addr.sin_port;


Not sure why you are using the server's port here.  In almost every
case the server side port number will be 2049, so it really doesn't
add any uniquification.


The reason lies is in almost every case.  It's possible to  
configure it
such that a server is running two separate NFS servers on different  
ports.


We should explore whether it is typical or even possible that such a  
configuration exports the same file handles on different ports, and  
whether that really matters to the client.



I strongly recommend you use the existing IPv6 address conversion
macros for this instead of open-coding yet another way of mapping an
IPv4 address to an IPv6 address.

However, since AF_INET6 support is being introduced in the NFS client
in 2.6.24, I recommend you take a look at these source files after
Trond has pushed his NFS_ALL for 2.6.24.


I'll look at them.


I always do this:  I meant 2.6.25, not 2.6.24.

By the time you return, basic IPv6 support for NFSv4 should be in  
2.6.25-rc1's NFS client (not server).  Not that it is bug-free, but  
an implementation is now there.


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html