Re: [PATCH v3] capabilities.7, prctl.2: Document ambient capabilities

2015-12-04 Thread Michael Kerrisk (man-pages)
Hi Andy,

I have applied your patch (below). Thanks for writing it.
But I have a question or two and a request.

===

In the capabilities(7) page tehre is the longstanding text:

   An  application  can use the following call to lock itself, and
   all of its descendants, into an environment where the only  way
   of  gaining capabilities is by executing a program with associ‐
   ated file capabilities:

   prctl(PR_SET_SECUREBITS,
   SECBIT_KEEP_CAPS_LOCKED |
   SECBIT_NO_SETUID_FIXUP |
   SECBIT_NO_SETUID_FIXUP_LOCKED |
   SECBIT_NOROOT |
   SECBIT_NOROOT_LOCKED);

As far as I can estimate, no changes are needed here to include 
SECBIT_NO_CAP_AMBIENT_RAISE and SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED
in the above prctl() call, but could you confirm please?

===

In the message for kernel commit 
58319057b7847667f0c9585b9de0e8932b0fdb08
you included this text:

[[
Because capability inheritance is so
broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
then calling execve effectively drops capabilities.  Therefore,
setresuid from root to nonroot conditionally clears pA unless
SECBIT_NO_SETUID_FIXUP is set.  Processes that don't like this can
re-add bits to pA afterwards.
]]

I'm struggling to understand the significance of this text,
especially as your man-pages patch makes no mention of this point.

The thing is, that text ("Therefore...") implies that there's
something special going on beyond the rules already documented
elsewhere. I mean, according to the rules aly documented elsewhere
in the page:

(1) If a process with UIDs of 0 sets all its UIDs
nonzero, then, the permitted and effective sets are cleared
(that's the classical behavior), and because the permitted
set is cleared, then so is the ambient set.

(2) And if we set SECBIT_NO_SETUID_FIXUP then
a UID 0 ==> nonzero transition doesn't clear permitted and
effective sets, and then of course the ambient set is not
cleared.

So, what additional point were you meaning to convey in
the commit message? (Maybe it was just cruft in the commit
message, but if not, can you explain precisely the arguments 
for setresuid() that are supposed to generate the special
behavior described by the above text.)

===

I did quite a bit of tweaking of the text that you added for
the capabilities page. Could you please check the following
to make sure I added no errors:

   Ambient (since Linux 4.3):
  This  is a set of capabilities that are preserved across
  an execve(2) of a program that is not  privileged.   The
  ambient capability set obeys the invariant that no capa‐
  bility can ever be ambient if it is not  both  permitted
  and inheritable.

  The  ambient  capability  set  can  be directly modified
  using prctl(2).  Ambient capabilities are  automatically
  lowered  if  either  of  the  corresponding permitted or
  inheritable capabilities is lowered.

  Executing a program that changes UID or GID due  to  the
  set-user-ID  or set-group-ID bits or executing a program
  that has any file capabilities set will clear the  ambi‐
  ent  set.  Ambient capabilities are added to the permit‐
  ted set and assigned to the effective set when execve(2)
  is called.

Cheers,

Michael

On 11/04/2015 12:42 AM, Andy Lutomirski wrote:
> Reviewed-by: Kees Cook 
> Signed-off-by: Andy Lutomirski 
> ---
> 
> Changes from v2: Add a note about arg3 == 0 in CLEAR_ALL.
> 
> man2/prctl.2| 13 +
>  man7/capabilities.7 | 40 ++--
>  2 files changed, 47 insertions(+), 6 deletions(-)
> 
> diff --git a/man2/prctl.2 b/man2/prctl.2
> index e743a6305969..bf8680f3b62d 100644
> --- a/man2/prctl.2
> +++ b/man2/prctl.2
> @@ -954,6 +954,19 @@ had been called.
>  For further information on Intel MPX, see the kernel source file
>  .IR Documentation/x86/intel_mpx.txt .
>  .\"
> +.TP
> +.BR PR_CAP_AMBIENT " (since Linux 4.2)"
> +Reads or changes the ambient capability set.  If arg2 is 
> PR_CAP_AMBIENT_RAISE,
> +then the capability specified in arg3 is added to the ambient set.  This will
> +fail, returning EPERM, if the capability is not already both permitted and
> +inheritable or if the SECBIT_NO_CAP_AMBIENT_RAISE securebit is set.  If arg2
> +is PR_CAP_AMBIENT_LOWER, then the capability specified in arg3 is removed
> +from the ambient set.  If arg2 is PR_CAP_AMBIENT_IS_SET, then
> +.BR prctl (2)
> +will return 1 if the capability in arg3 is in the ambient set and 0 if not.
> +If arg2 is PR_CAP_AMBIENT_CLEAR_ALL, then all capabilities will
> +be removed from the ambient set.  (Using PR_CAP_AMBIENT_CLEAR_ALL requires
> +setting arg3 to zero.)
>  .SH RETURN VALUE
>  On success,
>  .BR PR_GET_DUMPABLE ,
> 

Re: [PATCH 17/19] fuse: Support fuse filesystems outside of init_user_ns

2015-12-04 Thread Seth Forshee
On Wed, Dec 02, 2015 at 09:40:17AM -0600, Seth Forshee wrote:
> @@ -155,11 +155,22 @@ static ino_t fuse_squash_ino(u64 ino64)
>   return ino;
>  }
>  
> -void fuse_change_attributes_common(struct inode *inode, struct fuse_attr 
> *attr,
> -u64 attr_valid)
> +int fuse_change_attributes_common(struct inode *inode, struct fuse_attr 
> *attr,
> +   u64 attr_valid)
>  {
>   struct fuse_conn *fc = get_fuse_conn(inode);
>   struct fuse_inode *fi = get_fuse_inode(inode);
> + kuid_t uid;
> + kgid_t gid;
> +
> + uid = make_kuid(fc->user_ns, attr->uid);
> + gid = make_kgid(fc->user_ns, attr->gid);
> + if (!uid_valid(uid) || !gid_valid(gid)) {
> + make_bad_inode(inode);
> + return -EIO;
> + }

Eric - I had kind of forgotten about this part until just now, but
previously with these patches we had discussed how to handle ids from
the filesystem that aren't valid in s_user_ns. My intention is to set
the kuids in the inode to invalid, and in these patches I've updated the
vfs so that it should be safe to do that. But at some point I think you
had suggested marking the inodes bad, and I must have added this as a
result. I guess we need to decide which way to go. I favor using invalid
ids so that a user privileged in s_user_ns can still access the inode,
change ownership, etc., but I'm interested to hear your opinion.

Thanks,
Seth
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] Smack: harmless underflow in smk_set_cipso()

2015-12-04 Thread Dan Carpenter
On Thu, Dec 03, 2015 at 02:23:22PM -0800, Casey Schaufler wrote:
> On 11/3/2015 2:15 PM, Dan Carpenter wrote:
> >Also checkpatch complains that we should use kstrtouint() instead of
> >sscanf here.
> >
> >Signed-off-by: Dan Carpenter 
> 
> This no longer parses cipso specifications correctly.
> I haven't investigated why. It looks as if it should
> work. Does kstrtouint() allow for leading whitespace?

No it doesn't.  I feel slightly betrayed because checkpatch.pl told me
to introduce that bug.  I have said so many times that I think
checkpatch.pl should not tell people to introduce bugs...

regards,
dan carpenter

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/19] fs: Refuse uid/gid changes which don't map into s_user_ns

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:09AM -0600, Seth Forshee wrote:
> Add checks to inode_change_ok to verify that uid and gid changes
> will map into the superblock's user namespace. If they do not
> fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

...  although i could see root on the host being upset that it can't
assign a uid not valid in the mounter's ns.  But it does seem safer.

> ---
>  fs/attr.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/attr.c b/fs/attr.c
> index 6530ced19697..55b46e3aa888 100644
> --- a/fs/attr.c
> +++ b/fs/attr.c
> @@ -42,6 +42,17 @@ int inode_change_ok(const struct inode *inode, struct 
> iattr *attr)
>   return error;
>   }
>  
> + /*
> +  * Verify that uid/gid changes are valid in the target namespace
> +  * of the superblock. This cannot be overriden using ATTR_FORCE.
> +  */
> + if (ia_valid & ATTR_UID &&
> + from_kuid(inode->i_sb->s_user_ns, attr->ia_uid) == (uid_t)-1)
> + return -EOVERFLOW;
> + if (ia_valid & ATTR_GID &&
> + from_kgid(inode->i_sb->s_user_ns, attr->ia_gid) == (gid_t)-1)
> + return -EOVERFLOW;
> +
>   /* If force is set do it anyway. */
>   if (ia_valid & ATTR_FORCE)
>   return 0;
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/19] fs: Refuse uid/gid changes which don't map into s_user_ns

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 11:27:38AM -0600, Serge E. Hallyn wrote:
> On Wed, Dec 02, 2015 at 09:40:09AM -0600, Seth Forshee wrote:
> > Add checks to inode_change_ok to verify that uid and gid changes
> > will map into the superblock's user namespace. If they do not
> > fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
> > 
> > Signed-off-by: Seth Forshee 
> 
> Acked-by: Serge Hallyn 
> 
> ...  although i could see root on the host being upset that it can't
> assign a uid not valid in the mounter's ns.  But it does seem safer.

That change wouldn't be representable in the backing store though, and
that could lead to unexpected behaviour. It's better to tell root that
we can't make the requested change, in my opinion.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 01/19] block_dev: Support checking inode permissions in lookup_bdev()

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:01AM -0600, Seth Forshee wrote:
> When looking up a block device by path no permission check is
> done to verify that the user has access to the block device inode
> at the specified path. In some cases it may be necessary to
> check permissions towards the inode, such as allowing
> unprivileged users to mount block devices in user namespaces.
> 
> Add an argument to lookup_bdev() to optionally perform this
> permission check. A value of 0 skips the permission check and
> behaves the same as before. A non-zero value specifies the mask
> of access rights required towards the inode at the specified
> path. The check is always skipped if the user has CAP_SYS_ADMIN.
> 
> All callers of lookup_bdev() currently pass a mask of 0, so this
> patch results in no functional change. Subsequent patches will
> add permission checks where appropriate.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  drivers/md/bcache/super.c |  2 +-
>  drivers/md/dm-table.c |  2 +-
>  drivers/mtd/mtdsuper.c|  2 +-
>  fs/block_dev.c| 13 ++---
>  fs/quota/quota.c  |  2 +-
>  include/linux/fs.h|  2 +-
>  6 files changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 679a093a3bf6..e8287b0d1dac 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1926,7 +1926,7 @@ static ssize_t register_bcache(struct kobject *k, 
> struct kobj_attribute *attr,
> sb);
>   if (IS_ERR(bdev)) {
>   if (bdev == ERR_PTR(-EBUSY)) {
> - bdev = lookup_bdev(strim(path));
> + bdev = lookup_bdev(strim(path), 0);
>   mutex_lock(_register_lock);
>   if (!IS_ERR(bdev) && bch_is_open(bdev))
>   err = "device already registered";
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 061152a43730..81c60b2495ed 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -380,7 +380,7 @@ int dm_get_device(struct dm_target *ti, const char *path, 
> fmode_t mode,
>   BUG_ON(!t);
>  
>   /* convert the path to a device */
> - bdev = lookup_bdev(path);
> + bdev = lookup_bdev(path, 0);
>   if (IS_ERR(bdev)) {
>   dev = name_to_dev_t(path);
>   if (!dev)
> diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
> index 20c02a3b7417..b5b60e1af31c 100644
> --- a/drivers/mtd/mtdsuper.c
> +++ b/drivers/mtd/mtdsuper.c
> @@ -176,7 +176,7 @@ struct dentry *mount_mtd(struct file_system_type 
> *fs_type, int flags,
>   /* try the old way - the hack where we allowed users to mount
>* /dev/mtdblock$(n) but didn't actually _use_ the blockdev
>*/
> - bdev = lookup_bdev(dev_name);
> + bdev = lookup_bdev(dev_name, 0);
>   if (IS_ERR(bdev)) {
>   ret = PTR_ERR(bdev);
>   pr_debug("MTDSB: lookup_bdev() returned %d\n", ret);
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index f90d91efa1b4..3ebbde85d898 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1426,7 +1426,7 @@ struct block_device *blkdev_get_by_path(const char 
> *path, fmode_t mode,
>   struct block_device *bdev;
>   int err;
>  
> - bdev = lookup_bdev(path);
> + bdev = lookup_bdev(path, 0);
>   if (IS_ERR(bdev))
>   return bdev;
>  
> @@ -1736,12 +1736,14 @@ EXPORT_SYMBOL(ioctl_by_bdev);
>  /**
>   * lookup_bdev  - lookup a struct block_device by name
>   * @pathname:special file representing the block device
> + * @mask:rights to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
>   *
>   * Get a reference to the blockdevice at @pathname in the current
>   * namespace if possible and return it.  Return ERR_PTR(error)
> - * otherwise.
> + * otherwise.  If @mask is non-zero, check for access rights to the
> + * inode at @pathname.
>   */
> -struct block_device *lookup_bdev(const char *pathname)
> +struct block_device *lookup_bdev(const char *pathname, int mask)
>  {
>   struct block_device *bdev;
>   struct inode *inode;
> @@ -1756,6 +1758,11 @@ struct block_device *lookup_bdev(const char *pathname)
>   return ERR_PTR(error);
>  
>   inode = d_backing_inode(path.dentry);
> + if (mask != 0 && !capable(CAP_SYS_ADMIN)) {
> + error = __inode_permission(inode, mask);
> + if (error)
> + goto fail;
> + }
>   error = -ENOTBLK;
>   if (!S_ISBLK(inode->i_mode))
>   goto fail;
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 3746367098fd..a40eaecbd5cc 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -733,7 +733,7 @@ static struct super_block *quotactl_block(const char 
> __user *special, int cmd)
>  
>   if (IS_ERR(tmp))
>   return ERR_CAST(tmp);

Re: [PATCH 07/19] fs: Check for invalid i_uid in may_follow_link()

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:07AM -0600, Seth Forshee wrote:
> Filesystem uids which don't map into a user namespace may result
> in inode->i_uid being INVALID_UID. A symlink and its parent
> could have different owners in the filesystem can both get
> mapped to INVALID_UID, which may result in following a symlink
> when this would not have otherwise been permitted when protected
> symlinks are enabled.
> 
> Add a new helper function, uid_valid_eq(), and use this to
> validate that the ids in may_follow_link() are both equal and
> valid. Also add an equivalent helper for gids, which is
> currently unused.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/namei.c |  2 +-
>  include/linux/uidgid.h | 10 ++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 288e8a74bf88..4ccafd391697 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -902,7 +902,7 @@ static inline int may_follow_link(struct nameidata *nd)
>   return 0;
>  
>   /* Allowed if parent directory and link owner match. */
> - if (uid_eq(parent->i_uid, inode->i_uid))
> + if (uid_valid_eq(parent->i_uid, inode->i_uid))
>   return 0;
>  
>   if (nd->flags & LOOKUP_RCU)
> diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h
> index 03835522dfcb..e09529fe2668 100644
> --- a/include/linux/uidgid.h
> +++ b/include/linux/uidgid.h
> @@ -117,6 +117,16 @@ static inline bool gid_valid(kgid_t gid)
>   return __kgid_val(gid) != (gid_t) -1;
>  }
>  
> +static inline bool uid_valid_eq(kuid_t left, kuid_t right)
> +{
> + return uid_eq(left, right) && uid_valid(left);
> +}
> +
> +static inline bool gid_valid_eq(kgid_t left, kgid_t right)
> +{
> + return gid_eq(left, right) && gid_valid(left);
> +}
> +
>  #ifdef CONFIG_USER_NS
>  
>  extern kuid_t make_kuid(struct user_namespace *from, uid_t uid);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/19] cred: Reject inodes with invalid ids in set_create_file_as()

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:08AM -0600, Seth Forshee wrote:
> Using INVALID_[UG]ID for the LSM file creation context doesn't
> make sense, so return an error if the inode passed to
> set_create_file_as() has an invalid id.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  kernel/cred.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 71179a09c1d6..ff8606f77d90 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -689,6 +689,8 @@ EXPORT_SYMBOL(set_security_override_from_ctx);
>   */
>  int set_create_files_as(struct cred *new, struct inode *inode)
>  {
> + if (!uid_valid(inode->i_uid) || !gid_valid(inode->i_gid))
> + return -EINVAL;
>   new->fsuid = inode->i_uid;
>   new->fsgid = inode->i_gid;
>   return security_kernel_create_files_as(new, inode);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/19] userns: Replace in_userns with current_in_userns

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:05AM -0600, Seth Forshee wrote:
> All current callers of in_userns pass current_user_ns as the
> first argument. Simplify by replacing in_userns with
> current_in_userns which checks whether current_user_ns is in the
> namespace supplied as an argument.
> 
> Signed-off-by: Seth Forshee 
> Acked-by: James Morris 

Acked-by: Serge Hallyn 

> ---
>  fs/namespace.c | 2 +-
>  include/linux/user_namespace.h | 6 ++
>  kernel/user_namespace.c| 6 +++---
>  security/commoncap.c   | 2 +-
>  4 files changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 2101ce7b96ab..18fc58760aec 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3286,7 +3286,7 @@ bool mnt_may_suid(struct vfsmount *mnt)
>* in other namespaces.
>*/
>   return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
> -in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
> +current_in_userns(mnt->mnt_sb->s_user_ns);
>  }
>  
>  static struct ns_common *mntns_get(struct task_struct *task)
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index a43faa727124..9217169c64cb 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -72,8 +72,7 @@ extern ssize_t proc_projid_map_write(struct file *, const 
> char __user *, size_t,
>  extern ssize_t proc_setgroups_write(struct file *, const char __user *, 
> size_t, loff_t *);
>  extern int proc_setgroups_show(struct seq_file *m, void *v);
>  extern bool userns_may_setgroups(const struct user_namespace *ns);
> -extern bool in_userns(const struct user_namespace *ns,
> -   const struct user_namespace *target_ns);
> +extern bool current_in_userns(const struct user_namespace *target_ns);
>  #else
>  
>  static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -103,8 +102,7 @@ static inline bool userns_may_setgroups(const struct 
> user_namespace *ns)
>   return true;
>  }
>  
> -static inline bool in_userns(const struct user_namespace *ns,
> -  const struct user_namespace *target_ns)
> +static inline bool current_in_userns(const struct user_namespace *target_ns)
>  {
>   return true;
>  }
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 69fbc377357b..5960edc7e644 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -949,10 +949,10 @@ bool userns_may_setgroups(const struct user_namespace 
> *ns)
>   * Returns true if @ns is the same namespace as or a descendant of
>   * @target_ns.
>   */
> -bool in_userns(const struct user_namespace *ns,
> -const struct user_namespace *target_ns)
> +bool current_in_userns(const struct user_namespace *target_ns)
>  {
> - for (; ns; ns = ns->parent) {
> + struct user_namespace *ns;
> + for (ns = current_user_ns(); ns; ns = ns->parent) {
>   if (ns == target_ns)
>   return true;
>   }
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 6243aef5860e..2119421613f6 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -450,7 +450,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool 
> *effective, bool *has_c
>  
>   if (!mnt_may_suid(bprm->file->f_path.mnt))
>   return 0;
> - if (!in_userns(current_user_ns(), 
> bprm->file->f_path.mnt->mnt_sb->s_user_ns))
> + if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns))
>   return 0;
>  
>   rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, );
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/19] fs: Treat foreign mounts as nosuid

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:03AM -0600, Seth Forshee wrote:
> From: Andy Lutomirski 
> 
> If a process gets access to a mount from a different user
> namespace, that process should not be able to take advantage of
> setuid files or selinux entrypoints from that filesystem.  Prevent
> this by treating mounts from other mount namespaces and those not
> owned by current_user_ns() or an ancestor as nosuid.
> 
> This will make it safer to allow more complex filesystems to be
> mounted in non-root user namespaces.
> 
> This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
> setgid, and file capability bits can no longer be abused if code in
> a user namespace were to clear nosuid on an untrusted filesystem,
> but this patch, by itself, is insufficient to protect the system
> from abuse of files that, when execed, would increase MAC privilege.
> 
> As a more concrete explanation, any task that can manipulate a
> vfsmount associated with a given user namespace already has
> capabilities in that namespace and all of its descendents.  If they
> can cause a malicious setuid, setgid, or file-caps executable to
> appear in that mount, then that executable will only allow them to
> elevate privileges in exactly the set of namespaces in which they
> are already privileges.
> 
> On the other hand, if they can cause a malicious executable to
> appear with a dangerous MAC label, running it could change the
> caller's security context in a way that should not have been
> possible, even inside the namespace in which the task is confined.
> 
> As a hardening measure, this would have made CVE-2014-5207 much
> more difficult to exploit.
> 
> Signed-off-by: Andy Lutomirski 
> Signed-off-by: Seth Forshee 
> Acked-by: James Morris 

Acked-by: Serge Hallyn 

> ---
>  fs/exec.c|  2 +-
>  fs/namespace.c   | 13 +
>  include/linux/mount.h|  1 +
>  security/commoncap.c |  2 +-
>  security/selinux/hooks.c |  2 +-
>  5 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index b06623a9347f..ea7311d72cc3 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,7 +1295,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
>   bprm->cred->euid = current_euid();
>   bprm->cred->egid = current_egid();
>  
> - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> + if (!mnt_may_suid(bprm->file->f_path.mnt))
>   return;
>  
>   if (task_no_new_privs(current))
> diff --git a/fs/namespace.c b/fs/namespace.c
> index da70f7c4ece1..2101ce7b96ab 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3276,6 +3276,19 @@ found:
>   return visible;
>  }
>  
> +bool mnt_may_suid(struct vfsmount *mnt)
> +{
> + /*
> +  * Foreign mounts (accessed via fchdir or through /proc
> +  * symlinks) are always treated as if they are nosuid.  This
> +  * prevents namespaces from trusting potentially unsafe
> +  * suid/sgid bits, file caps, or security labels that originate
> +  * in other namespaces.
> +  */
> + return !(mnt->mnt_flags & MNT_NOSUID) && check_mnt(real_mount(mnt)) &&
> +in_userns(current_user_ns(), mnt->mnt_sb->s_user_ns);
> +}
> +
>  static struct ns_common *mntns_get(struct task_struct *task)
>  {
>   struct ns_common *ns = NULL;
> diff --git a/include/linux/mount.h b/include/linux/mount.h
> index f822c3c11377..54a594d49733 100644
> --- a/include/linux/mount.h
> +++ b/include/linux/mount.h
> @@ -81,6 +81,7 @@ extern void mntput(struct vfsmount *mnt);
>  extern struct vfsmount *mntget(struct vfsmount *mnt);
>  extern struct vfsmount *mnt_clone_internal(struct path *path);
>  extern int __mnt_is_readonly(struct vfsmount *mnt);
> +extern bool mnt_may_suid(struct vfsmount *mnt);
>  
>  struct path;
>  extern struct vfsmount *clone_private_mount(struct path *path);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 400aa224b491..6243aef5860e 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -448,7 +448,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool 
> *effective, bool *has_c
>   if (!file_caps_enabled)
>   return 0;
>  
> - if (bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID)
> + if (!mnt_may_suid(bprm->file->f_path.mnt))
>   return 0;
>   if (!in_userns(current_user_ns(), 
> bprm->file->f_path.mnt->mnt_sb->s_user_ns))
>   return 0;
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index d0cfaa9f19d0..a5b93df6553f 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2171,7 +2171,7 @@ static int check_nnp_nosuid(const struct linux_binprm 
> *bprm,
>   const struct task_security_struct *new_tsec)
>  {
>   int nnp = (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS);
> - int nosuid = 

Re: [PATCH 02/19] block_dev: Check permissions towards block device inode when mounting

2015-12-04 Thread Serge E. Hallyn
On Wed, Dec 02, 2015 at 09:40:02AM -0600, Seth Forshee wrote:
> Unprivileged users should not be able to mount block devices when
> they lack sufficient privileges towards the block device inode.
> Update blkdev_get_by_path() to validate that the user has the
> required access to the inode at the specified path. The check
> will be skipped for CAP_SYS_ADMIN, so privileged mounts will
> continue working as before.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/block_dev.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 3ebbde85d898..4fdb6ab59816 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -1424,9 +1424,14 @@ struct block_device *blkdev_get_by_path(const char 
> *path, fmode_t mode,
>   void *holder)
>  {
>   struct block_device *bdev;
> + int perm = 0;
>   int err;
>  
> - bdev = lookup_bdev(path, 0);
> + if (mode & FMODE_READ)
> + perm |= MAY_READ;
> + if (mode & FMODE_WRITE)
> + perm |= MAY_WRITE;
> + bdev = lookup_bdev(path, perm);
>   if (IS_ERR(bdev))
>   return bdev;
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/19] fs: Refuse uid/gid changes which don't map into s_user_ns

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> On Fri, Dec 04, 2015 at 11:27:38AM -0600, Serge E. Hallyn wrote:
> > On Wed, Dec 02, 2015 at 09:40:09AM -0600, Seth Forshee wrote:
> > > Add checks to inode_change_ok to verify that uid and gid changes
> > > will map into the superblock's user namespace. If they do not
> > > fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.
> > > 
> > > Signed-off-by: Seth Forshee 
> > 
> > Acked-by: Serge Hallyn 
> > 
> > ...  although i could see root on the host being upset that it can't
> > assign a uid not valid in the mounter's ns.  But it does seem safer.
> 
> That change wouldn't be representable in the backing store though, and
> that could lead to unexpected behaviour. It's better to tell root that
> we can't make the requested change, in my opinion.

Makes sense.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/19] fuse: Support fuse filesystems outside of init_user_ns

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> Update fuse to translate uids and gids to/from the user namspace
> of the process servicing requests on /dev/fuse. Any ids which do
> not map into the namespace will result in errors. inodes will
> also be marked bad when unmappable ids are received from the
> userspace fuse process.
> 
> Currently no use cases are known for letting the userspace fuse
> daemon switch namespaces after opening /dev/fuse. Given this
> fact, and in order to keep the implementation as simple as
> possible and ease security auditing, the user namespace from
> which /dev/fuse is opened is used for all id translations. This
> is required to be the same namespace as s_user_ns to maintain
> behavior consistent with other filesystems which can be mounted
> in user namespaces.
> 
> For cuse the namespace used for the connection is also simply
> current_user_ns() at the time /dev/cuse is opened.
> 
> Signed-off-by: Seth Forshee 
> ---
>  fs/fuse/cuse.c   |  3 +-
>  fs/fuse/dev.c| 13 +
>  fs/fuse/dir.c| 81 ++---
>  fs/fuse/fuse_i.h | 14 ++
>  fs/fuse/inode.c  | 85 
> ++--
>  5 files changed, 136 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
> index eae2c11268bc..a10aca57bfe4 100644
> --- a/fs/fuse/cuse.c
> +++ b/fs/fuse/cuse.c
> @@ -48,6 +48,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "fuse_i.h"
>  
> @@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, struct 
> file *file)
>   if (!cc)
>   return -ENOMEM;
>  
> - fuse_conn_init(>fc);
> + fuse_conn_init(>fc, current_user_ns());
>  
>   fud = fuse_dev_alloc(>fc);
>   if (!fud) {
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index a4f6f30d6d86..11b4cb0a0e2f 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
>  
>  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req *req)
>  {
> - req->in.h.uid = from_kuid_munged(_user_ns, current_fsuid());
> - req->in.h.gid = from_kgid_munged(_user_ns, current_fsgid());
> + req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
> + req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
>   req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
>  }
>  
> @@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn 
> *fc, unsigned npages,
>   __set_bit(FR_WAITING, >flags);
>   if (for_background)
>   __set_bit(FR_BACKGROUND, >flags);
> - if (req->in.h.pid == 0) {
> + if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
> + req->in.h.gid == (gid_t)-1) {
>   fuse_put_request(fc, req);
>   return ERR_PTR(-EOVERFLOW);
>   }
> @@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, 
> struct file *file,
>   struct fuse_in *in;
>   unsigned reqsize;
>  
> - if (task_active_pid_ns(current) != fc->pid_ns)
> + if (task_active_pid_ns(current) != fc->pid_ns ||
> + current_user_ns() != fc->user_ns)

Do you think this should be using current_in_user_ns(fc->user_ns) ?

Opening a file, forking (maybe unsharing) then acting on the file is
pretty standard fare which this would break.

(same for pidns, i guess, as well as for write below)

>   return -EIO;
>  
>   restart:
> @@ -1880,7 +1882,8 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>   struct fuse_req *req;
>   struct fuse_out_header oh;
>  
> - if (task_active_pid_ns(current) != fc->pid_ns)
> + if (task_active_pid_ns(current) != fc->pid_ns ||
> + current_user_ns() != fc->user_ns)
>   return -EIO;
>  
>   if (nbytes < sizeof(struct fuse_out_header))
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Serge E. Hallyn
Heh, I was looking over 
http://www.gossamer-threads.com/lists/linux/kernel/103611
a little while ago :)  The same question was asked 16 years ago.  Apparently
the answer then was that it was easier than fixing the code.

Quoting Theodore Ts'o (ty...@mit.edu):
> The fact that we need CAP_SYS_RAIO for FIBMAP is pretty silly, given
> that FIEMAP does not require privileges --- and in fact the preferred
> interface.  Why not just simply drop the requirement for privileges
> for FIBMAP?
> 
> (Seth, Serge, this isn't a real objection to your patch; but the fact
> that FIBMAP requires root has always been a bit silly, and this would
> be a great opportunity to simplify things a bit.)
> 
>   - Ted
>   
> 
> On Fri, Dec 04, 2015 at 01:11:43PM -0600, Serge E. Hallyn wrote:
> > Quoting Seth Forshee (seth.fors...@canonical.com):
> > > Signed-off-by: Seth Forshee 
> > 
> > Acked-by: Serge Hallyn 
> > 
> > > ---
> > >  fs/ioctl.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > index 5d01d2638ca5..45c371bed7ee 100644
> > > --- a/fs/ioctl.c
> > > +++ b/fs/ioctl.c
> > > @@ -55,7 +55,7 @@ static int ioctl_fibmap(struct file *filp, int __user 
> > > *p)
> > >   /* do we support this mess? */
> > >   if (!mapping->a_ops->bmap)
> > >   return -EINVAL;
> > > - if (!capable(CAP_SYS_RAWIO))
> > > + if (!ns_capable(filp->f_inode->i_sb->s_user_ns, CAP_SYS_RAWIO))
> > >   return -EPERM;
> > >   res = get_user(block, p);
> > >   if (res)
> > > -- 
> > > 1.9.1
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/19] fuse: Restrict allow_other to the superblock's namespace or a descendant

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> Unprivileged users are normally restricted from mounting with the
> allow_other option by system policy, but this could be bypassed
> for a mount done with user namespace root permissions. In such
> cases allow_other should not allow users outside the userns
> to access the mount as doing so would give the unprivileged user
> the ability to manipulate processes it would otherwise be unable
> to manipulate. Restrict allow_other to apply to users in the same
> userns used at mount or a descendant of that namespace.
> 
> Signed-off-by: Seth Forshee 
> ---
>  fs/fuse/dir.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index f67f4dd86b36..5b8edb1203b8 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1018,8 +1018,14 @@ int fuse_allow_current_process(struct fuse_conn *fc)
>  {
>   const struct cred *cred;
>  
> - if (fc->flags & FUSE_ALLOW_OTHER)
> - return 1;
> + if (fc->flags & FUSE_ALLOW_OTHER) {
> + struct user_namespace *ns;
> + for (ns = current_user_ns(); ns; ns = ns->parent) {
> + if (ns == fc->user_ns)
> + return 1;
> + }

use current_in_userns() ?

> + return 0;
> + }
>  
>   cred = current_cred();
>   if (uid_eq(cred->euid, fc->user_id) &&
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/19] fuse: Support fuse filesystems outside of init_user_ns

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 02:03:55PM -0600, Serge E. Hallyn wrote:
> Quoting Seth Forshee (seth.fors...@canonical.com):
> > Update fuse to translate uids and gids to/from the user namspace
> > of the process servicing requests on /dev/fuse. Any ids which do
> > not map into the namespace will result in errors. inodes will
> > also be marked bad when unmappable ids are received from the
> > userspace fuse process.
> > 
> > Currently no use cases are known for letting the userspace fuse
> > daemon switch namespaces after opening /dev/fuse. Given this
> > fact, and in order to keep the implementation as simple as
> > possible and ease security auditing, the user namespace from
> > which /dev/fuse is opened is used for all id translations. This
> > is required to be the same namespace as s_user_ns to maintain
> > behavior consistent with other filesystems which can be mounted
> > in user namespaces.
> > 
> > For cuse the namespace used for the connection is also simply
> > current_user_ns() at the time /dev/cuse is opened.
> > 
> > Signed-off-by: Seth Forshee 
> > ---
> >  fs/fuse/cuse.c   |  3 +-
> >  fs/fuse/dev.c| 13 +
> >  fs/fuse/dir.c| 81 ++---
> >  fs/fuse/fuse_i.h | 14 ++
> >  fs/fuse/inode.c  | 85 
> > ++--
> >  5 files changed, 136 insertions(+), 60 deletions(-)
> > 
> > diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c
> > index eae2c11268bc..a10aca57bfe4 100644
> > --- a/fs/fuse/cuse.c
> > +++ b/fs/fuse/cuse.c
> > @@ -48,6 +48,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "fuse_i.h"
> >  
> > @@ -498,7 +499,7 @@ static int cuse_channel_open(struct inode *inode, 
> > struct file *file)
> > if (!cc)
> > return -ENOMEM;
> >  
> > -   fuse_conn_init(>fc);
> > +   fuse_conn_init(>fc, current_user_ns());
> >  
> > fud = fuse_dev_alloc(>fc);
> > if (!fud) {
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index a4f6f30d6d86..11b4cb0a0e2f 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -127,8 +127,8 @@ static void __fuse_put_request(struct fuse_req *req)
> >  
> >  static void fuse_req_init_context(struct fuse_conn *fc, struct fuse_req 
> > *req)
> >  {
> > -   req->in.h.uid = from_kuid_munged(_user_ns, current_fsuid());
> > -   req->in.h.gid = from_kgid_munged(_user_ns, current_fsgid());
> > +   req->in.h.uid = from_kuid(fc->user_ns, current_fsuid());
> > +   req->in.h.gid = from_kgid(fc->user_ns, current_fsgid());
> > req->in.h.pid = pid_nr_ns(task_pid(current), fc->pid_ns);
> >  }
> >  
> > @@ -186,7 +186,8 @@ static struct fuse_req *__fuse_get_req(struct fuse_conn 
> > *fc, unsigned npages,
> > __set_bit(FR_WAITING, >flags);
> > if (for_background)
> > __set_bit(FR_BACKGROUND, >flags);
> > -   if (req->in.h.pid == 0) {
> > +   if (req->in.h.pid == 0 || req->in.h.uid == (uid_t)-1 ||
> > +   req->in.h.gid == (gid_t)-1) {
> > fuse_put_request(fc, req);
> > return ERR_PTR(-EOVERFLOW);
> > }
> > @@ -1248,7 +1249,8 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, 
> > struct file *file,
> > struct fuse_in *in;
> > unsigned reqsize;
> >  
> > -   if (task_active_pid_ns(current) != fc->pid_ns)
> > +   if (task_active_pid_ns(current) != fc->pid_ns ||
> > +   current_user_ns() != fc->user_ns)
> 
> Do you think this should be using current_in_user_ns(fc->user_ns) ?
> 
> Opening a file, forking (maybe unsharing) then acting on the file is
> pretty standard fare which this would break.
> 
> (same for pidns, i guess, as well as for write below)

I'd rather leave it as is. It might be okay, but even if current_user_ns
is a child of fc->user_ns it could end up seeing ids that aren't valid
for it's namespaces, and that might cause issues for filesystems which
aren't expecting it.

But if you have a use case for a process in a child namespace servicing
requests for a mount, I'd be willing to reconsider.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Introduce new security.nscapability xattr

2015-12-04 Thread Serge E. Hallyn
Quoting Eric W. Biederman (ebied...@xmission.com):
> "Serge E. Hallyn"  writes:
> 
> > A common way for daemons to run with minimal privilege is to start as root,
> > perhaps setuid-root, choose a desired capability set, set PR_SET_KEEPCAPS,
> > then change uid to non-root.  A simpler way to achieve this is to set file
> > capabilities on a not-setuid-root binary.  However, when installing a 
> > package
> > inside a (user-namespaced) container, packages cannot be installed with file
> > capabilities.  For this reason, containers must install ping setuid-root.
> 
> Don't ping sockets avoid that specific problem?
> 
> I expect the general case still holds.
> 
> > To achieve this, we would need for containers to be able to request file
> > capabilities be added to a file without causing these to be honored in the
> > initial user namespace.
> >
> > To this end, the patch below introduces a new capability xattr format.  The
> > main enhancement over the existing security.capability xattr is that we
> > tag capability sets with a uid - the uid of the root user in the namespace
> > where the capabilities are set.  The capabilities will be ignored in any
> > other namespace.  The special case of uid == -1 (which must only ever be
> > able to be set by kuid 0) means use the capabilities in all
> > namespaces.

really since security.capability xattrs are currently honored in all
namespaces this isn't really necessary.  Until and unless Seth's set
changes that.

> 
> A quick comment on this.
> 
> We currently allow capabilities that have been gained to be valid in all
> descendent user namespaces.
> 
> Applying this principle to the on-disk capabilities would make it so
> that uid 0 would mean capabilities in all namespaces.
> 
> It might be worth it to introduce a fixed sized array with a length
> parameter of perhaps 32 entries which is a path of root uids as seen by
> the initial user namespace.  That way the entire construction of the
> user namespace could be verified.  AKA verify the current user namespace
> and the parent and the parents parent.  Up to the user namespace the
> current filesystem is mounted in. We would look at how much space
> allows an xattr to be stored without causing filesystems a challenge
> to properly size such an array.
> 
> Given that uids are fundamentally flat that might not be particularly
> useful.  If we add an alternative way of identifying user namespaces
> say a privileged operation that set a uuid, then the complete path would
> be more interesting.
> 
> > An alternative format would use a pair of uids to indicate a range of 
> > rootids.
> > This would allow root in a user namespace with uids 10-165536 mapped to
> > set the xattr once on a file, then launch nested containers wherein the file
> > could be used with privilege.  That's not what this patch does, but would be
> > a trivial change if people think it would be worthwhile.
> >
> > This patch does not actually address the real problem, which is setting the
> > xattrs from inside containers.  For that, I think the best solution is to
> > add a pair of new system calls, setfcap and getfcap. Userspace would for
> > instance call fsetfcap(fd, cap_user_header_t, cap_user_data_t), to which
> > the kernel would, if not in init_user_ns, react by writing an appropriate
> > security.nscapability xattr.
> 
> That feels hard to maintain, but you may be correct that we have a small
> enough userspace that it would not be a problem.
> 
> Eric
> 
> 
> > The libcap2 library's cap_set_file/cap_get_file could be switched over
> > transparently to use this to hide its use from all callers.
> >
> > Comments appreciated.
> >
> > Note - In this patch, file capabilities only work for containers which have
> > a root uid defined.  We may want to allow -1 uids to work in all
> > namespaces.  There certainly would be uses for this, but I'm a bit unsettled
> > about the implications of allowing a program privilege in a container where
> > there is no uid with privilege.  This needs more thought.

So for actually enabling (user-namespaced) containers to use these, there
are a few possibilities that come to mine.

1. A new setfcap (/getfcap) syscall.  Uses mapped uid 0 from
current_user_ns() to write a value in the security.nscapability xattr.
Userspace doesn't need to worry at all about namespace issues.

2. Just expect userspace to write a xattr;  kernel checks that no values
are changed for any other namespaces.  This could be a lot of parsing and
verifying in the kernel.

3. Switch the xattr scheme - instead of one security.nscapability xattr
with multiple entries, use security.nscapability.$(rootid).  Now the
kernel only needs to verify that the $rootid is valid for the writing
task, and we don't need a new syscall.  OTOH userspace needs to know
what it's doing.  Of course we can still hide that behind libcap2's helpers.

Any opinions on which way seems best?  1 does seem cleanest (and supports
use of seccomp 

Re: [PATCH 18/19] fuse: Restrict allow_other to the superblock's namespace or a descendant

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 02:05:41PM -0600, Serge E. Hallyn wrote:
> Quoting Seth Forshee (seth.fors...@canonical.com):
> > Unprivileged users are normally restricted from mounting with the
> > allow_other option by system policy, but this could be bypassed
> > for a mount done with user namespace root permissions. In such
> > cases allow_other should not allow users outside the userns
> > to access the mount as doing so would give the unprivileged user
> > the ability to manipulate processes it would otherwise be unable
> > to manipulate. Restrict allow_other to apply to users in the same
> > userns used at mount or a descendant of that namespace.
> > 
> > Signed-off-by: Seth Forshee 
> > ---
> >  fs/fuse/dir.c | 10 --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index f67f4dd86b36..5b8edb1203b8 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1018,8 +1018,14 @@ int fuse_allow_current_process(struct fuse_conn *fc)
> >  {
> > const struct cred *cred;
> >  
> > -   if (fc->flags & FUSE_ALLOW_OTHER)
> > -   return 1;
> > +   if (fc->flags & FUSE_ALLOW_OTHER) {
> > +   struct user_namespace *ns;
> > +   for (ns = current_user_ns(); ns; ns = ns->parent) {
> > +   if (ns == fc->user_ns)
> > +   return 1;
> > +   }
> 
>   use current_in_userns() ?

Yes, it should. I wrote this before I wrote the patch which adds that
function and never thought to go back to change it here.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/1] selinux: use absolute path to include directory

2015-12-04 Thread Andy Shevchenko
Compiler warns us a lot that it can't find include folder because it's provided
in relative form.

  CC  security/selinux/netlabel.o
cc1: warning: security/selinux/include: No such file or directory 
[-Wmissing-include-dirs]
cc1: warning: security/selinux/include: No such file or directory 
[-Wmissing-include-dirs]
cc1: warning: security/selinux/include: No such file or directory 
[-Wmissing-include-dirs]
cc1: warning: security/selinux/include: No such file or directory 
[-Wmissing-include-dirs]

Add $(srctree) prefix to the path.

Signed-off-by: Andy Shevchenko 
---
 security/selinux/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/Makefile b/security/selinux/Makefile
index ad5cd76..90d1adb 100644
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -13,7 +13,7 @@ selinux-$(CONFIG_SECURITY_NETWORK_XFRM) += xfrm.o
 
 selinux-$(CONFIG_NETLABEL) += netlabel.o
 
-ccflags-y := -Isecurity/selinux -Isecurity/selinux/include
+ccflags-y := -Isecurity/selinux -I$(srctree)/security/selinux/include
 
 $(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/19] capabilities: Allow privileged user in s_user_ns to set file caps

2015-12-04 Thread Serge E. Hallyn
On Fri, Dec 04, 2015 at 02:36:27PM -0600, Seth Forshee wrote:
> On Fri, Dec 04, 2015 at 01:42:06PM -0600, Serge E. Hallyn wrote:
> > Quoting Seth Forshee (seth.fors...@canonical.com):
> > > A privileged user in a super block's s_user_ns is privileged
> > > towards that file system and thus should be allowed to set file
> > > capabilities. The file capabilities will not be trusted outside
> > > of s_user_ns, so an unprivileged user cannot use this to gain
> > > privileges in a user namespace where they are not already
> > > privileged.
> > > 
> > > Signed-off-by: Seth Forshee 
> > > ---
> > >  security/commoncap.c | 12 
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/security/commoncap.c b/security/commoncap.c
> > > index 2119421613f6..d6c80c19c449 100644
> > > --- a/security/commoncap.c
> > > +++ b/security/commoncap.c
> > > @@ -653,15 +653,17 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
> > >  int cap_inode_setxattr(struct dentry *dentry, const char *name,
> > >  const void *value, size_t size, int flags)
> > >  {
> > > + struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
> > > +
> > >   if (!strcmp(name, XATTR_NAME_CAPS)) {
> > > - if (!capable(CAP_SETFCAP))
> > > + if (!ns_capable(user_ns, CAP_SETFCAP))
> > >   return -EPERM;
> > 
> > This, for file capabilities, is fine,
> > 
> > >   return 0;
> > >   }
> > >  
> > >   if (!strncmp(name, XATTR_SECURITY_PREFIX,
> > >sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> > > - !capable(CAP_SYS_ADMIN))
> > > + !ns_capable(user_ns, CAP_SYS_ADMIN))
> > 
> > but this is for all other security.*.
> > 
> > It's probably still ok, but let's think about it a sec.  MAC like
> > selinux or smack should be orthogonal to DAC.  Capabilities are the
> > same in essence, but the reason they can be treated differently here
> > is because capabilties are in fact targated at a user namespace.
> > Apparmor namespaces, for instance, are completely orthogonal to user
> > namespaces, as are contexts in selinux.
> > 
> > Now, if smack or selinux xattrs are being set then those modules
> > should be gating these writes.  Booting a kernel without those
> > modules should be a challenge for an untrusted user.  But such a
> > situation could be exploited opportunistically if it were to happen.
> > 
> > The problem with simply not changing this here is that if selinux
> > or smack authorizes the xattr write, then commoncap shouldn't be
> > denying it.
> 
> This is partly the logic behind the change, the other part being that
> the user could already insert the xattrs directly into the backing store
> so the LSMs must be prepared not to trust them in any case. But the
> commit message doesn't explain that, my mistake. And it's a question
> worth revisiting.
> 
> > I get the feeling we need cooperation among the modules (i.e. "if
> > the write is to 'security.$lsm' and $lsm is not loaded, then require
> > capable(CAP_SYS_ADMIN), else just allow)  But that's not how things are
> > structured right now.
> > 
> > Maybe security.ko could grow central logic to 'assign' security.*
> > capabilities to specific lsms and gate writes to those if $lsm is not
> > loaded.
> 
> I don't see any meaningful difference between this case and the case of
> inserting them into the backing store before mounting. We can't do
> anything to prevent the latter, so LSMs just have to be aware of
> unprivileged mounts and handle them with care. Previous patches do this
> for SELinux and Smack by adopting a policy that doesn't respect security
> labels on disk for these mounts. So I don't think that refusing to set
> security.* xattrs for an LSM that isn't loaded really accomplishes
> anything.

Good point.  I think that's the thing to point in the patch description.
(The original patch description doesn't mention any change apart from
file capabilities, which I think it should)

> Then there's the case of setting xattrs for an LSM that is currently
> loaded. I think that SELinux and Smack are both going to refuse these
> writes, Smack rather directly by seeing that the user lacks global
> CAP_MAC_ADMIN and SELinux by virtue of the fact that the previous patch
> in this series applies mountpoint labeling to these mounts. As far as I
> can tell the other LSMs don't take security policy from xattrs.
> 
> So, as far as I can tell, removing the check doesn't create any
> vulnerabilities.
> 
> But that's not to say it's the right thing to do. After reconsidering
> it, I'm inclined to be more conservative and to keep requiring
> capable(CAP_SYS_ADMIN) until such time as there's a use case for
> allowing a user privileged only in s_user_ns to write these xattrs.
> 
> > Does anything break if the second hunk in each fn in this patch is
> > not applied?
> 
> Not that I'm aware of, no.

That's ok, let's leave the patch as is, with updated description.

Acked-by: Serge 

Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Theodore Ts'o
On Fri, Dec 04, 2015 at 02:45:32PM -0600, Seth Forshee wrote:
> On Fri, Dec 04, 2015 at 02:07:36PM -0600, Serge E. Hallyn wrote:
> > Heh, I was looking over 
> > http://www.gossamer-threads.com/lists/linux/kernel/103611
> > a little while ago :)  The same question was asked 16 years ago.  Apparently
> > the answer then was that it was easier than fixing the code.
> 
> So it seems then that either it still isn't safe and so unprivileged
> users shouldn't be allowed to do it at all, or else it's safe and we
> should drop the requirement completely. I can't say which is right,
> unfortunately.

It may not have been safe 16 years agoo, but giving invalid arguments
to FIBMAP is safe for ext4 and ext2.  This is the sort of thing that
tools like trinity should and does test for, so I think it should be
fine to remove the root check for FIBMAP.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/19] capabilities: Allow privileged user in s_user_ns to set file caps

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 01:42:06PM -0600, Serge E. Hallyn wrote:
> Quoting Seth Forshee (seth.fors...@canonical.com):
> > A privileged user in a super block's s_user_ns is privileged
> > towards that file system and thus should be allowed to set file
> > capabilities. The file capabilities will not be trusted outside
> > of s_user_ns, so an unprivileged user cannot use this to gain
> > privileges in a user namespace where they are not already
> > privileged.
> > 
> > Signed-off-by: Seth Forshee 
> > ---
> >  security/commoncap.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/security/commoncap.c b/security/commoncap.c
> > index 2119421613f6..d6c80c19c449 100644
> > --- a/security/commoncap.c
> > +++ b/security/commoncap.c
> > @@ -653,15 +653,17 @@ int cap_bprm_secureexec(struct linux_binprm *bprm)
> >  int cap_inode_setxattr(struct dentry *dentry, const char *name,
> >const void *value, size_t size, int flags)
> >  {
> > +   struct user_namespace *user_ns = dentry->d_sb->s_user_ns;
> > +
> > if (!strcmp(name, XATTR_NAME_CAPS)) {
> > -   if (!capable(CAP_SETFCAP))
> > +   if (!ns_capable(user_ns, CAP_SETFCAP))
> > return -EPERM;
> 
> This, for file capabilities, is fine,
> 
> > return 0;
> > }
> >  
> > if (!strncmp(name, XATTR_SECURITY_PREFIX,
> >  sizeof(XATTR_SECURITY_PREFIX) - 1) &&
> > -   !capable(CAP_SYS_ADMIN))
> > +   !ns_capable(user_ns, CAP_SYS_ADMIN))
> 
> but this is for all other security.*.
> 
> It's probably still ok, but let's think about it a sec.  MAC like
> selinux or smack should be orthogonal to DAC.  Capabilities are the
> same in essence, but the reason they can be treated differently here
> is because capabilties are in fact targated at a user namespace.
> Apparmor namespaces, for instance, are completely orthogonal to user
> namespaces, as are contexts in selinux.
> 
> Now, if smack or selinux xattrs are being set then those modules
> should be gating these writes.  Booting a kernel without those
> modules should be a challenge for an untrusted user.  But such a
> situation could be exploited opportunistically if it were to happen.
> 
> The problem with simply not changing this here is that if selinux
> or smack authorizes the xattr write, then commoncap shouldn't be
> denying it.

This is partly the logic behind the change, the other part being that
the user could already insert the xattrs directly into the backing store
so the LSMs must be prepared not to trust them in any case. But the
commit message doesn't explain that, my mistake. And it's a question
worth revisiting.

> I get the feeling we need cooperation among the modules (i.e. "if
> the write is to 'security.$lsm' and $lsm is not loaded, then require
> capable(CAP_SYS_ADMIN), else just allow)  But that's not how things are
> structured right now.
> 
> Maybe security.ko could grow central logic to 'assign' security.*
> capabilities to specific lsms and gate writes to those if $lsm is not
> loaded.

I don't see any meaningful difference between this case and the case of
inserting them into the backing store before mounting. We can't do
anything to prevent the latter, so LSMs just have to be aware of
unprivileged mounts and handle them with care. Previous patches do this
for SELinux and Smack by adopting a policy that doesn't respect security
labels on disk for these mounts. So I don't think that refusing to set
security.* xattrs for an LSM that isn't loaded really accomplishes
anything.

Then there's the case of setting xattrs for an LSM that is currently
loaded. I think that SELinux and Smack are both going to refuse these
writes, Smack rather directly by seeing that the user lacks global
CAP_MAC_ADMIN and SELinux by virtue of the fact that the previous patch
in this series applies mountpoint labeling to these mounts. As far as I
can tell the other LSMs don't take security policy from xattrs.

So, as far as I can tell, removing the check doesn't create any
vulnerabilities.

But that's not to say it's the right thing to do. After reconsidering
it, I'm inclined to be more conservative and to keep requiring
capable(CAP_SYS_ADMIN) until such time as there's a use case for
allowing a user privileged only in s_user_ns to write these xattrs.

> Does anything break if the second hunk in each fn in this patch is
> not applied?

Not that I'm aware of, no.

Seth
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 02:07:36PM -0600, Serge E. Hallyn wrote:
> Heh, I was looking over 
> http://www.gossamer-threads.com/lists/linux/kernel/103611
> a little while ago :)  The same question was asked 16 years ago.  Apparently
> the answer then was that it was easier than fixing the code.

So it seems then that either it still isn't safe and so unprivileged
users shouldn't be allowed to do it at all, or else it's safe and we
should drop the requirement completely. I can't say which is right,
unfortunately.

> 
> Quoting Theodore Ts'o (ty...@mit.edu):
> > The fact that we need CAP_SYS_RAIO for FIBMAP is pretty silly, given
> > that FIEMAP does not require privileges --- and in fact the preferred
> > interface.  Why not just simply drop the requirement for privileges
> > for FIBMAP?
> > 
> > (Seth, Serge, this isn't a real objection to your patch; but the fact
> > that FIBMAP requires root has always been a bit silly, and this would
> > be a great opportunity to simplify things a bit.)
> > 
> > - Ted
> > 
> > 
> > On Fri, Dec 04, 2015 at 01:11:43PM -0600, Serge E. Hallyn wrote:
> > > Quoting Seth Forshee (seth.fors...@canonical.com):
> > > > Signed-off-by: Seth Forshee 
> > > 
> > > Acked-by: Serge Hallyn 
> > > 
> > > > ---
> > > >  fs/ioctl.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > > > index 5d01d2638ca5..45c371bed7ee 100644
> > > > --- a/fs/ioctl.c
> > > > +++ b/fs/ioctl.c
> > > > @@ -55,7 +55,7 @@ static int ioctl_fibmap(struct file *filp, int __user 
> > > > *p)
> > > > /* do we support this mess? */
> > > > if (!mapping->a_ops->bmap)
> > > > return -EINVAL;
> > > > -   if (!capable(CAP_SYS_RAWIO))
> > > > +   if (!ns_capable(filp->f_inode->i_sb->s_user_ns, CAP_SYS_RAWIO))
> > > > return -EPERM;
> > > > res = get_user(block, p);
> > > > if (res)
> > > > -- 
> > > > 1.9.1
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" 
> > > > in
> > > > the body of a message to majord...@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > Please read the FAQ at  http://www.tux.org/lkml/
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" 
> > > in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Serge E. Hallyn
On Fri, Dec 04, 2015 at 06:11:52PM -0500, Theodore Ts'o wrote:
> On Fri, Dec 04, 2015 at 02:45:32PM -0600, Seth Forshee wrote:
> > On Fri, Dec 04, 2015 at 02:07:36PM -0600, Serge E. Hallyn wrote:
> > > Heh, I was looking over 
> > > http://www.gossamer-threads.com/lists/linux/kernel/103611
> > > a little while ago :)  The same question was asked 16 years ago.  
> > > Apparently
> > > the answer then was that it was easier than fixing the code.
> > 
> > So it seems then that either it still isn't safe and so unprivileged
> > users shouldn't be allowed to do it at all, or else it's safe and we
> > should drop the requirement completely. I can't say which is right,
> > unfortunately.
> 
> It may not have been safe 16 years agoo, but giving invalid arguments
> to FIBMAP is safe for ext4 and ext2.  This is the sort of thing that
> tools like trinity should and does test for, so I think it should be
> fine to remove the root check for FIBMAP.

Seth, can I tempt you into sending a standalone patch to remove that? :)

-serge
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Andreas Dilger

> On Dec 4, 2015, at 4:11 PM, Theodore Ts'o  wrote:
> 
> On Fri, Dec 04, 2015 at 02:45:32PM -0600, Seth Forshee wrote:
>> On Fri, Dec 04, 2015 at 02:07:36PM -0600, Serge E. Hallyn wrote:
>>> Heh, I was looking over 
>>> http://www.gossamer-threads.com/lists/linux/kernel/103611
>>> a little while ago :)  The same question was asked 16 years ago.  Apparently
>>> the answer then was that it was easier than fixing the code.
>> 
>> So it seems then that either it still isn't safe and so unprivileged
>> users shouldn't be allowed to do it at all, or else it's safe and we
>> should drop the requirement completely. I can't say which is right,
>> unfortunately.
> 
> It may not have been safe 16 years agoo, but giving invalid arguments
> to FIBMAP is safe for ext4 and ext2.  This is the sort of thing that
> tools like trinity should and does test for, so I think it should be
> fine to remove the root check for FIBMAP.

You can use FIEMAP on regular files and directories without special permission:

$ filefrag -v /etc
Filesystem type is: ef53
File size of /etc is 12288 (3 blocks of 4096 bytes)
 ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..   0:8396832..   8396832:  1:
   1:1..   2:8397051..   8397052:  2:8396833: last,eof
/etc: 2 extents found


FIEMAP also has the benefit that you don't need to call it millions of times
for large files, like is needed for FIBMAP.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Seth Forshee
On Fri, Dec 04, 2015 at 05:43:49PM -0600, Serge E. Hallyn wrote:
> On Fri, Dec 04, 2015 at 06:11:52PM -0500, Theodore Ts'o wrote:
> > On Fri, Dec 04, 2015 at 02:45:32PM -0600, Seth Forshee wrote:
> > > On Fri, Dec 04, 2015 at 02:07:36PM -0600, Serge E. Hallyn wrote:
> > > > Heh, I was looking over 
> > > > http://www.gossamer-threads.com/lists/linux/kernel/103611
> > > > a little while ago :)  The same question was asked 16 years ago.  
> > > > Apparently
> > > > the answer then was that it was easier than fixing the code.
> > > 
> > > So it seems then that either it still isn't safe and so unprivileged
> > > users shouldn't be allowed to do it at all, or else it's safe and we
> > > should drop the requirement completely. I can't say which is right,
> > > unfortunately.
> > 
> > It may not have been safe 16 years agoo, but giving invalid arguments
> > to FIBMAP is safe for ext4 and ext2.  This is the sort of thing that
> > tools like trinity should and does test for, so I think it should be
> > fine to remove the root check for FIBMAP.
> 
> Seth, can I tempt you into sending a standalone patch to remove that? :)

Patch sent. I'll drop this patch in v2.
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/19] fs: Update posix_acl support to handle user namespace mounts

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> ids in on-disk ACLs should be converted to s_user_ns instead of
> init_user_ns as is done now. This introduces the possibility for
> id mappings to fail, and when this happens syscalls will return
> EOVERFLOW.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/posix_acl.c  | 67 
> ++---
>  fs/xattr.c  | 19 +---
>  include/linux/posix_acl_xattr.h | 17 ---
>  3 files changed, 70 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index 4adde1e2cbec..a29442eb4af8 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -595,59 +595,77 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
>  /*
>   * Fix up the uids and gids in posix acl extended attributes in place.
>   */
> -static void posix_acl_fix_xattr_userns(
> +static int posix_acl_fix_xattr_userns(
>   struct user_namespace *to, struct user_namespace *from,
>   void *value, size_t size)
>  {
>   posix_acl_xattr_header *header = (posix_acl_xattr_header *)value;
>   posix_acl_xattr_entry *entry = (posix_acl_xattr_entry *)(header+1), 
> *end;
>   int count;
> - kuid_t uid;
> - kgid_t gid;
> + kuid_t kuid;
> + uid_t uid;
> + kgid_t kgid;
> + gid_t gid;
>  
>   if (!value)
> - return;
> + return 0;
>   if (size < sizeof(posix_acl_xattr_header))
> - return;
> + return 0;
>   if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION))
> - return;
> + return 0;
>  
>   count = posix_acl_xattr_count(size);
>   if (count < 0)
> - return;
> + return 0;
>   if (count == 0)
> - return;
> + return 0;
>  
>   for (end = entry + count; entry != end; entry++) {
>   switch(le16_to_cpu(entry->e_tag)) {
>   case ACL_USER:
> - uid = make_kuid(from, le32_to_cpu(entry->e_id));
> - entry->e_id = cpu_to_le32(from_kuid(to, uid));
> + kuid = make_kuid(from, le32_to_cpu(entry->e_id));
> + if (!uid_valid(kuid))
> + return -EOVERFLOW;
> + uid = from_kuid(to, kuid);
> + if (uid == (uid_t)-1)
> + return -EOVERFLOW;
> + entry->e_id = cpu_to_le32(uid);
>   break;
>   case ACL_GROUP:
> - gid = make_kgid(from, le32_to_cpu(entry->e_id));
> - entry->e_id = cpu_to_le32(from_kgid(to, gid));
> + kgid = make_kgid(from, le32_to_cpu(entry->e_id));
> + if (!gid_valid(kgid))
> + return -EOVERFLOW;
> + gid = from_kgid(to, kgid);
> + if (gid == (gid_t)-1)
> + return -EOVERFLOW;
> + entry->e_id = cpu_to_le32(gid);
>   break;
>   default:
>   break;
>   }
>   }
> +
> + return 0;
>  }
>  
> -void posix_acl_fix_xattr_from_user(void *value, size_t size)
> +int
> +posix_acl_fix_xattr_from_user(struct user_namespace *target_ns, void *value,
> +   size_t size)
>  {
> - struct user_namespace *user_ns = current_user_ns();
> - if (user_ns == _user_ns)
> - return;
> - posix_acl_fix_xattr_userns(_user_ns, user_ns, value, size);
> + struct user_namespace *source_ns = current_user_ns();
> + if (source_ns == target_ns)
> + return 0;
> + return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
>  }
>  
> -void posix_acl_fix_xattr_to_user(void *value, size_t size)
> +int
> +posix_acl_fix_xattr_to_user(struct user_namespace *source_ns, void *value,
> + size_t size)
>  {
> - struct user_namespace *user_ns = current_user_ns();
> - if (user_ns == _user_ns)
> - return;
> - posix_acl_fix_xattr_userns(user_ns, _user_ns, value, size);
> + struct user_namespace *target_ns = current_user_ns();
> + if (target_ns == source_ns)
> + return 0;
> + return posix_acl_fix_xattr_userns(target_ns, source_ns, value, size);
>  }
>  
>  /*
> @@ -782,7 +800,7 @@ posix_acl_xattr_get(const struct xattr_handler *handler,
>   if (acl == NULL)
>   return -ENODATA;
>  
> - error = posix_acl_to_xattr(_user_ns, acl, value, size);
> + error = posix_acl_to_xattr(dentry->d_sb->s_user_ns, acl, value, size);
>   posix_acl_release(acl);
>  
>   return error;
> @@ -810,7 +828,8 @@ posix_acl_xattr_set(const struct xattr_handler *handler,
>   return -EPERM;
>  
>   if (value) {
> - acl = 

Re: [PATCH 11/19] fs: Ensure the mounter of a filesystem is privileged towards its inodes

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> The mounter of a filesystem should be privileged towards the
> inodes of that filesystem. Extend the checks in
> inode_owner_or_capable() and capable_wrt_inode_uidgid() to
> permit access by users priviliged in the user namespace of the
> inode's superblock.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/inode.c  |  3 +++
>  kernel/capability.c | 13 +
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 1be5f9003eb3..01c036fe1950 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1962,6 +1962,9 @@ bool inode_owner_or_capable(const struct inode *inode)
>   ns = current_user_ns();
>   if (ns_capable(ns, CAP_FOWNER) && kuid_has_mapping(ns, inode->i_uid))
>   return true;
> +
> + if (ns_capable(inode->i_sb->s_user_ns, CAP_FOWNER))
> + return true;
>   return false;
>  }
>  EXPORT_SYMBOL(inode_owner_or_capable);
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 45432b54d5c6..5137a38a5670 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -437,13 +437,18 @@ EXPORT_SYMBOL(file_ns_capable);
>   *
>   * Return true if the current task has the given capability targeted at
>   * its own user namespace and that the given inode's uid and gid are
> - * mapped into the current user namespace.
> + * mapped into the current user namespace, or if the current task has
> + * the capability towards the user namespace of the inode's superblock.
>   */
>  bool capable_wrt_inode_uidgid(const struct inode *inode, int cap)
>  {
> - struct user_namespace *ns = current_user_ns();
> + struct user_namespace *ns;
>  
> - return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
> - kgid_has_mapping(ns, inode->i_gid);
> + ns = current_user_ns();
> + if (ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid) &&
> + kgid_has_mapping(ns, inode->i_gid))
> + return true;
> +
> + return ns_capable(inode->i_sb->s_user_ns, cap);
>  }
>  EXPORT_SYMBOL(capable_wrt_inode_uidgid);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/19] fs: Don't remove suid for CAP_FSETID in s_user_ns

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> Expand the check in should_remove_suid() to keep privileges for
> CAP_FSETID in s_user_ns rather than init_user_ns.
> 
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 01c036fe1950..3e7c74da9304 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1684,7 +1684,8 @@ int should_remove_suid(struct dentry *dentry)
>   if (unlikely((mode & S_ISGID) && (mode & S_IXGRP)))
>   kill |= ATTR_KILL_SGID;
>  
> - if (unlikely(kill && !capable(CAP_FSETID) && S_ISREG(mode)))
> + if (unlikely(kill && !ns_capable(dentry->d_sb->s_user_ns, CAP_FSETID) &&
> +  S_ISREG(mode)))
>   return kill;
>  
>   return 0;
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/19] fs: Allow superblock owner to access do_remount_sb()

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> Superblock level remounts are currently restricted to global
> CAP_SYS_ADMIN, as is the path for changing the root mount to
> read only on umount. Loosen both of these permission checks to
> also allow CAP_SYS_ADMIN in any namespace which is privileged
> towards the userns which originally mounted the filesystem.
> 
> Signed-off-by: Seth Forshee 
> Acked-by: "Eric W. Biederman" 
> ---

Acked-by: Serge Hallyn 

>  fs/namespace.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 18fc58760aec..b00a765895e7 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -1510,7 +1510,7 @@ static int do_umount(struct mount *mnt, int flags)
>* Special case for "unmounting" root ...
>* we just try to remount it readonly.
>*/
> - if (!capable(CAP_SYS_ADMIN))
> + if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>   return -EPERM;
>   down_write(>s_umount);
>   if (!(sb->s_flags & MS_RDONLY))
> @@ -2199,7 +2199,7 @@ static int do_remount(struct path *path, int flags, int 
> mnt_flags,
>   down_write(>s_umount);
>   if (flags & MS_BIND)
>   err = change_mount_flags(path->mnt, flags);
> - else if (!capable(CAP_SYS_ADMIN))
> + else if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN))
>   err = -EPERM;
>   else
>   err = do_remount_sb(sb, flags, data, 0);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/19] fs: Permit FIBMAP for users with CAP_SYS_RAWIO in s_user_ns

2015-12-04 Thread Serge E. Hallyn
Quoting Seth Forshee (seth.fors...@canonical.com):
> Signed-off-by: Seth Forshee 

Acked-by: Serge Hallyn 

> ---
>  fs/ioctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ioctl.c b/fs/ioctl.c
> index 5d01d2638ca5..45c371bed7ee 100644
> --- a/fs/ioctl.c
> +++ b/fs/ioctl.c
> @@ -55,7 +55,7 @@ static int ioctl_fibmap(struct file *filp, int __user *p)
>   /* do we support this mess? */
>   if (!mapping->a_ops->bmap)
>   return -EINVAL;
> - if (!capable(CAP_SYS_RAWIO))
> + if (!ns_capable(filp->f_inode->i_sb->s_user_ns, CAP_SYS_RAWIO))
>   return -EPERM;
>   res = get_user(block, p);
>   if (res)
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html