Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-07 Thread Miklos Szeredi
Maybe sysctls just need to check capabilities, instead of uids.  I
think that would make a lot of sense anyway.
   
   Would it be as simple as tagging the inodes with capability sets?  One
   set for writing, or one each for reading and writing?
  
  Yes, or something even simpler, like mapping the owner permission bits
  to CAP_SYS_ADMIN.  There seem to be very few different permissions
  under /proc/sys:
  
  --w---
  -r--r--r--
  -rw---
  -rw-r--r--
  
  As long as the group and other bits are always the same, and we accept
  that the owner bits really mean CAP_SYS_ADMIN and not something else,
 
 But I would assume some things under /proc/sys/net/ipv4 or
 /proc/sys/net/ath0 require CAP_NET_ADMIN rather than CAP_SYS_ADMIN?

I guess so.  I'm not very familiar with the different capabilities :)

How about this patch then: a hybrid solution between just relying on
permission bits, and specifying separate capability sets for read and
write in addition to the permission bits.

Untested, the 'cap' field obviously still needs to be filled in where
appropriate.

Miklos


Index: linux/include/linux/sysctl.h
===
--- linux.orig/include/linux/sysctl.h   2008-02-04 12:29:01.0 +0100
+++ linux/include/linux/sysctl.h2008-02-07 15:19:06.0 +0100
@@ -1041,6 +1041,7 @@ struct ctl_table 
void *data;
int maxlen;
mode_t mode;
+   int cap;/* Capability needed to read/write */
struct ctl_table *child;
struct ctl_table *parent;   /* Automatically set */
proc_handler *proc_handler; /* Callback for text formatting */
Index: linux/kernel/sysctl.c
===
--- linux.orig/kernel/sysctl.c  2008-02-05 22:17:05.0 +0100
+++ linux/kernel/sysctl.c   2008-02-07 15:30:45.0 +0100
@@ -1527,14 +1527,26 @@ out:
  * some sysctl variables are readonly even to root.
  */
 
-static int test_perm(int mode, int op)
+static int test_perm(struct ctl_table *table, int op)
 {
-   if (!current-euid)
-   mode = 6;
-   else if (in_egroup_p(0))
-   mode = 3;
+   int cap = table-cap;
+   mode_t mode = table-mode;
+
+   if (!cap)
+   cap = CAP_SYS_ADMIN;
+
+   if ((op  MAY_READ)  !(mode  S_IRUGO))
+   return -EACCES;
+
+   if ((op  MAY_WRITE)  !(mode  S_IWUGO))
+   return -EACCES;
+
+   if (capable(cap))
+   return 0;
+
if ((mode  op  0007) == op)
return 0;
+
return -EACCES;
 }
 
@@ -1544,7 +1556,7 @@ int sysctl_perm(struct ctl_table *table,
error = security_sysctl(table, op);
if (error)
return error;
-   return test_perm(table-mode, op);
+   return test_perm(table, op);
 }
 
 #ifdef CONFIG_SYSCTL_SYSCALL
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-07 Thread Serge E. Hallyn
Quoting Miklos Szeredi ([EMAIL PROTECTED]):
   Maybe sysctls just need to check capabilities, instead of uids.  I
   think that would make a lot of sense anyway.
  
  Would it be as simple as tagging the inodes with capability sets?  One
  set for writing, or one each for reading and writing?
 
 Yes, or something even simpler, like mapping the owner permission bits
 to CAP_SYS_ADMIN.  There seem to be very few different permissions
 under /proc/sys:
 
 --w---
 -r--r--r--
 -rw---
 -rw-r--r--
 
 As long as the group and other bits are always the same, and we accept
 that the owner bits really mean CAP_SYS_ADMIN and not something else,

But I would assume some things under /proc/sys/net/ipv4 or
/proc/sys/net/ath0 require CAP_NET_ADMIN rather than CAP_SYS_ADMIN?

 then the permission check would not need to look at uids or gids at
 all.
 
 Miklos
 -
 To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-07 Thread Serge E. Hallyn
Quoting Miklos Szeredi ([EMAIL PROTECTED]):
 Maybe sysctls just need to check capabilities, instead of uids.  I
 think that would make a lot of sense anyway.

Would it be as simple as tagging the inodes with capability sets?  One
set for writing, or one each for reading and writing?
   
   Yes, or something even simpler, like mapping the owner permission bits
   to CAP_SYS_ADMIN.  There seem to be very few different permissions
   under /proc/sys:
   
   --w---
   -r--r--r--
   -rw---
   -rw-r--r--
   
   As long as the group and other bits are always the same, and we accept
   that the owner bits really mean CAP_SYS_ADMIN and not something else,
  
  But I would assume some things under /proc/sys/net/ipv4 or
  /proc/sys/net/ath0 require CAP_NET_ADMIN rather than CAP_SYS_ADMIN?
 
 I guess so.  I'm not very familiar with the different capabilities :)
 
 How about this patch then: a hybrid solution between just relying on
 permission bits, and specifying separate capability sets for read and
 write in addition to the permission bits.
 
 Untested, the 'cap' field obviously still needs to be filled in where
 appropriate.
 
 Miklos
 
 
 Index: linux/include/linux/sysctl.h
 ===
 --- linux.orig/include/linux/sysctl.h 2008-02-04 12:29:01.0 +0100
 +++ linux/include/linux/sysctl.h  2008-02-07 15:19:06.0 +0100
 @@ -1041,6 +1041,7 @@ struct ctl_table 
   void *data;
   int maxlen;
   mode_t mode;
 + int cap;/* Capability needed to read/write */
   struct ctl_table *child;
   struct ctl_table *parent;   /* Automatically set */
   proc_handler *proc_handler; /* Callback for text formatting */
 Index: linux/kernel/sysctl.c
 ===
 --- linux.orig/kernel/sysctl.c2008-02-05 22:17:05.0 +0100
 +++ linux/kernel/sysctl.c 2008-02-07 15:30:45.0 +0100
 @@ -1527,14 +1527,26 @@ out:
   * some sysctl variables are readonly even to root.
   */
 
 -static int test_perm(int mode, int op)
 +static int test_perm(struct ctl_table *table, int op)
  {
 - if (!current-euid)
 - mode = 6;
 - else if (in_egroup_p(0))
 - mode = 3;
 + int cap = table-cap;
 + mode_t mode = table-mode;
 +
 + if (!cap)
 + cap = CAP_SYS_ADMIN;
 +
 + if ((op  MAY_READ)  !(mode  S_IRUGO))
 + return -EACCES;
 +
 + if ((op  MAY_WRITE)  !(mode  S_IWUGO))
 + return -EACCES;
 +
 + if (capable(cap))
 + return 0;
 +
   if ((mode  op  0007) == op)
   return 0;
 +
   return -EACCES;

I like how simple it appears to be :)

At first I missed the fact that owning uid is always 0 so I thought the
uid processing wasn't quite enough.  But since it's always 0, the only
question is whether there are any /proc/sys files whose users currently
depend on being setgid 0 and setgid non-0 with no capabilities.

On my laptop, 'find /proc/sys -type f -perm -020' gives me no results,
so that is promising.

So this certainly seems like a good first step.  In fact, combined with
/proc/sys/ being partially remounted per container like /proc/sys/net is
doing, we may not even need to do anything with CAP_NS_OVERRIDE.

thanks,
-serge

  }
 
 @@ -1544,7 +1556,7 @@ int sysctl_perm(struct ctl_table *table,
   error = security_sysctl(table, op);
   if (error)
   return error;
 - return test_perm(table-mode, op);
 + return test_perm(table, op);
  }
 
  #ifdef CONFIG_SYSCTL_SYSCALL
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-06 Thread Serge E. Hallyn
Quoting Miklos Szeredi ([EMAIL PROTECTED]):
 From: Miklos Szeredi [EMAIL PROTECTED]
 
 Add the following:
 
   /proc/sys/fs/types/${FS_TYPE}/usermount_safe
 
 Signed-off-by: Miklos Szeredi [EMAIL PROTECTED]

Thanks, Miklos, good explanations in the docs.

Acked-by: Serge Hallyn [EMAIL PROTECTED]

One comment inline, but not imo your problem :)

 ---
 
 Index: linux/fs/filesystems.c
 ===
 --- linux.orig/fs/filesystems.c   2008-02-04 23:47:46.0 +0100
 +++ linux/fs/filesystems.c2008-02-04 23:48:04.0 +0100
 @@ -12,6 +12,7 @@
  #include linux/kmod.h
  #include linux/init.h
  #include linux/module.h
 +#include linux/sysctl.h
  #include asm/uaccess.h
 
  /*
 @@ -51,6 +52,57 @@ static struct file_system_type **find_fi
   return p;
  }
 
 +#define MAX_FILESYSTEM_VARS 1
 +
 +struct filesystem_sysctl_table {
 + struct ctl_table_header *header;
 + struct ctl_table table[MAX_FILESYSTEM_VARS + 1];
 +};
 +
 +/*
 + * Create /sys/fs/types/${FSNAME} directory with per fs-type tunables.
 + */
 +static int filesystem_sysctl_register(struct file_system_type *fs)
 +{
 + struct filesystem_sysctl_table *t;
 + struct ctl_path path[] = {
 + { .procname = fs, .ctl_name = CTL_FS },
 + { .procname = types, .ctl_name = CTL_UNNUMBERED },
 + { .procname = fs-name, .ctl_name = CTL_UNNUMBERED },
 + { }
 + };
 +
 + t = kzalloc(sizeof(*t), GFP_KERNEL);
 + if (!t)
 + return -ENOMEM;
 +
 +
 + t-table[0].ctl_name = CTL_UNNUMBERED;
 + t-table[0].procname = usermount_safe;
 + t-table[0].maxlen = sizeof(int);
 + t-table[0].data = fs-fs_safe;
 + t-table[0].mode = 0644;

Yikes, this could be a problem for containers, as it's simply tied to
uid 0, whereas tying it to a capability would let us solve it with
capability bounds.

This might mean more urgency to get user namespaces working at least
with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken
out of a container's capability bounding set.

 + t-table[0].proc_handler = proc_dointvec;
 +
 + t-header = register_sysctl_paths(path, t-table);
 + if (!t-header) {
 + kfree(t);
 + return -ENOMEM;
 + }
 +
 + fs-sysctl_table = t;
 +
 + return 0;
 +}
 +
 +static void filesystem_sysctl_unregister(struct file_system_type *fs)
 +{
 + struct filesystem_sysctl_table *t = fs-sysctl_table;
 +
 + unregister_sysctl_table(t-header);
 + kfree(t);
 +}
 +
  /**
   *   register_filesystem - register a new filesystem
   *   @fs: the file system structure
 @@ -80,6 +132,13 @@ int register_filesystem(struct file_syst
   else
   *p = fs;
   write_unlock(file_systems_lock);
 +
 + if (res == 0) {
 + res = filesystem_sysctl_register(fs);
 + if (res != 0)
 + unregister_filesystem(fs);
 + }
 +
   return res;
  }
 
 @@ -108,6 +167,7 @@ int unregister_filesystem(struct file_sy
   *tmp = fs-next;
   fs-next = NULL;
   write_unlock(file_systems_lock);
 + filesystem_sysctl_unregister(fs);
   return 0;
   }
   tmp = (*tmp)-next;
 Index: linux/include/linux/fs.h
 ===
 --- linux.orig/include/linux/fs.h 2008-02-04 23:48:02.0 +0100
 +++ linux/include/linux/fs.h  2008-02-04 23:48:04.0 +0100
 @@ -1444,6 +1444,7 @@ struct file_system_type {
   struct module *owner;
   struct file_system_type * next;
   struct list_head fs_supers;
 + struct filesystem_sysctl_table *sysctl_table;
 
   struct lock_class_key s_lock_key;
   struct lock_class_key s_umount_key;
 Index: linux/Documentation/filesystems/proc.txt
 ===
 --- linux.orig/Documentation/filesystems/proc.txt 2008-02-04 
 23:47:58.0 +0100
 +++ linux/Documentation/filesystems/proc.txt  2008-02-04 23:48:04.0 
 +0100
 @@ -44,6 +44,7 @@ Table of Contents
2.14   /proc/pid/io - Display the IO accounting fields
2.15   /proc/pid/coredump_filter - Core dump filtering settings
2.16   /proc/pid/mountinfo - Information about mounts
 +  2.17   /proc/sys/fs/types - File system type specific parameters
 
  
 --
  Preface
 @@ -2392,4 +2393,34 @@ For more information see:
Documentation/filesystems/sharedsubtree.txt
 
 
 +2.17 /proc/sys/fs/types/ - File system type specific parameters
 +
 +
 +There's a separate directory /proc/sys/fs/types/type/ for each
 +filesystem type, containing the following files:
 +
 +usermount_safe
 +--
 +
 +Setting this to non-zero will allow 

Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-06 Thread Miklos Szeredi
  +   t-table[0].mode = 0644;
 
 Yikes, this could be a problem for containers, as it's simply tied to
 uid 0, whereas tying it to a capability would let us solve it with
 capability bounds.
 
 This might mean more urgency to get user namespaces working at least
 with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken
 out of a container's capability bounding set.

I think I understand the problem, but not the solution.  How do user
namespaces going to help?

Maybe sysctls just need to check capabilities, instead of uids.  I
think that would make a lot of sense anyway.

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-02-06 Thread Serge E. Hallyn
Quoting Miklos Szeredi ([EMAIL PROTECTED]):
   + t-table[0].mode = 0644;
  
  Yikes, this could be a problem for containers, as it's simply tied to
  uid 0, whereas tying it to a capability would let us solve it with
  capability bounds.
  
  This might mean more urgency to get user namespaces working at least
  with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken
  out of a container's capability bounding set.
 
 I think I understand the problem, but not the solution.  How do user
 namespaces going to help?

Well it somewhat depends on how we implement userns for filesystems
in the first place, and whether we end up splitting sysfs into
sub-filesystems as I think Eric Biederman has been advocating.  My
thoughts had been running along the lines of just tagging vfsmounts
with userns of the mounting process.  A task from outside the mounting
process' namespace would get user other permissions whether or not
its uid was the owning uid or uid 0 (unless the task had CAP_NS_OVERRIDE).

But really it gets more complicated for sysfs than something like ext2
since we really want to be able to filter files and directories for
different namespaces...  Handling sysfs user namespaces before we sort
out the rest of the sysfs stuff (being hashed out with network
namespaces) seems like jumping the gun a bit.

 Maybe sysctls just need to check capabilities, instead of uids.  I
 think that would make a lot of sense anyway.

Would it be as simple as tagging the inodes with capability sets?  One
set for writing, or one each for reading and writing?

thanks,
-serge
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-01-22 Thread Serge E. Hallyn
Quoting Miklos Szeredi ([EMAIL PROTECTED]):
What do you think about doing this only if FS_SAFE is also set,
so for instance at first only FUSE would allow itself to be
made user-mountable?

A safe thing to do, or overly intrusive?
   
   It goes somewhat against the no policy in kernel policy ;).  I think
   the warning in the documentation should be enough to make sysadmins
   think twice before doing anything foolish:
  
  Warning in which documentation?  A sysadmin considering setting fs_safe
  for ext2 or xfs isn't going to be looking at fuse docs, which I think is
  what you're talking about.  Are you going to add a file under
  Documentation/filesystems?
 
 Yes, I meant documentation of the new sysctl tunable in
 Documentation/filesystems/proc.txt:

Argh, sorry.

  Index: linux/Documentation/filesystems/proc.txt
  ===
  --- linux.orig/Documentation/filesystems/proc.txt   2008-01-16 
  13:25:07.0 +0100
  +++ linux/Documentation/filesystems/proc.txt2008-01-16 
  13:25:09.0 +0100
  @@ -43,6 +43,7 @@ Table of Contents
 2.13 /proc/pid/oom_score - Display current oom-killer score
 2.14 /proc/pid/io - Display the IO accounting fields
 2.15 /proc/pid/coredump_filter - Core dump filtering settings
  +  2.16 /proc/sys/fs/types - File system type specific parameters
   
   
  --
   Preface
  @@ -2283,4 +2284,21 @@ For example:
 $ echo 0x7  /proc/self/coredump_filter
 $ ./some_program
   
  +2.16 /proc/sys/fs/types/ - File system type specific parameters
  +
  +
  +There's a separate directory /proc/sys/fs/types/type/ for each
  +filesystem type, containing the following files:
  +
  +usermount_safe
  +--
  +
  +Setting this to non-zero will allow filesystems of this type to be
  +mounted by unprivileged users (note, that there are other
  +prerequisites as well).
  +
  +Care should be taken when enabling this, since most
  +filesystems haven't been designed with unprivileged mounting
  +in mind.
  +
   
  --
  
 
 Do you think this is enough?  Or do we need something more, to prevent
 sysadmin inadvertently setting this for an unsafe filesystem?

I would think something more would be good.  First explaining
that fuse should be safe modulo warnings in the fuse documentation,
procfs and sysfs may be safe, while other filesystems are not known safe
at all.

Then explaining the dangers with not-known-safe filesystems and what is
needed to make them safe.  Clearly making sure input validation is
properly done so for instance getsb() doesn't turn into a buffer
overflow, etc.

Such a checklist also would be useful for holding a meaningful discussion
about the other filesystems and maybe turning some people loose on
an audit of other filesystems.

thanks,
-serge
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 07/10] unprivileged mounts: add sysctl tunable for safe property

2008-01-21 Thread Miklos Szeredi
 What do you think about doing this only if FS_SAFE is also set,
 so for instance at first only FUSE would allow itself to be
 made user-mountable?
 
 A safe thing to do, or overly intrusive?

It goes somewhat against the no policy in kernel policy ;).  I think
the warning in the documentation should be enough to make sysadmins
think twice before doing anything foolish:

 +Care should be taken when enabling this, since most
 +filesystems haven't been designed with unprivileged mounting
 +in mind.
 +

BTW, filesystems like 'proc' and 'sysfs' should also be safe, although
the only use for them being marked safe is if the users are allowed to
umount them from their private namespace (otherwise a 'mount --bind'
has the same effect as a new mount).

Thanks,
Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html