Re: [PATCH] userns/capability: Add user namespace capability

2015-10-22 Thread Eric W. Biederman
Andy Lutomirski  writes:

> At the risk of pointing out a can of worms, the attack surface also
> includes things like the iptables configuration APIs, parsers, and
> filter/conntrack/action modules.

It is worth noting that module auto-load does not happen if the
triggering code does not have the proper permissions in the initial user
namespace.

I agree that is another piece of code that should be counted.  How that
compares to the other 130,000 or so lines of code in the network stack
an unprivileged user can caused to be exercised already I don't know.
In my back of the napkin swag I had totally forgotten to count anything
in the network stack.

A lot of the netfilter code that I have read and looked at is
compartively simple and clean so I don't expect there is much risk
except from sheer volume of code there.

It is also tricky to count because the entire network side of the
networking stack is exposed to hostile users on the internet so anything
except the configuration is already exposed to hostile users.  The
average check entry is 15-20 lines long.  There appear to be 117 unique
check entry functions in the kernel so there may be another 2.5k lines of
code there.

Hmm.  And we have not had any design issues with the network stack.

Absent of design issues where the code even when implemented correctly
has the wrong semantics, we are left with the probability of exploitable
buggy code.  I suspect we have enough code even without user namespaces
enabled that the probability of exploitable buggy code someone in the
code that unprivilged users can cause to be exercised run is > 50%.

I wonder if there are any good statistical models that give realistic
estimates of those things.

Eric
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting

2015-10-01 Thread Eric W. Biederman
Mike Snitzer  writes:

> What layer establishes access rights to historically root-only
> priviledged block devices?  Is it user namespaces?

Block devices are weird.

Mounts historically have not checked the permissions on the block
devices because a mounter has CAP_SYS_ADMIN.

Unprivileged users are allowes to read/write block devices if
someone has given them permissions on the device node in the
filesystem.

The thinking with this patchset is to start performing the normal
block device access permission checks when mounting filesystems
when the mounter does not have the global CAP_SYS_ADMIN permission.

The truth is we are not much past the point of realizing that there were
no permission checks to use the actual block device passed in to mount,
so we could still be missing something. There is a lot going on with dm,
md, and lvm.  I don't know if the model of just look at the block device
inode and perform the permission checks is good enough.

> I haven't kept up with user namespaces as it relates to stacking block
> drivers like DM.  But I'm happy to come up to speed and at the same time
> help you verify all works as expected with DM blocks devices...

We are just getting there.  But if you can help that would be great.
The primary concern with dm is what happens when unprivileged users get
ahold of the code, and what happens when evil users corrupt the on-disk
format.

In principle dm like loop should be safe to use if there are not bugs
that make it unsafe for unprivileged users to access the code.

The goal if possible is to run things like docker without needed to be
root or even more fun to run docker in a container, and in general
enable nested containers.

Eric
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-30 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Eric W. Biederman ([EMAIL PROTECTED]):
 Mark Nelson [EMAIL PROTECTED] writes:
 
  Hi Paul and Eric,
 
  Do you guys have any objections to dropping the hijack_pid() and
  hijack_cgroup() parts of sys_hijack, leaving just hijack_ns() (see
  below for discussion)?
 
 I need to step back and study what is being proposed.
 
 My gut feeling is that you are proposing something that does not
 support forking me a process inside a container so I can have a
 shell without having to run a login program.

 Hmm, depends on exactly what you want, but you may be right.

 In terms of namespaces it'll be in the target container, including
 having a pid in the container.

Yes, which is generally what you want for a magic login shell.

 The most dangerous part about the purely ptrace method you mention is
 that pieces of the ptraced process' environment may leak, pollute,
 and attack your new process.  But it shouldn't be impossible to do
 it safely.  Just tedious.

Yes.  It is that use case more then anything I am concerned with.


 There is a reason I proposed ptrace as an initial prototype.
 
 All of the other uses of enter in a namespace context I feel confident
 we can support by just having proper virtual filesystems available
 to processes outside of the container.  For monitoring and control.

 I think you're showing an unhealthy amount of trust in both our ability
 to provide full fs-based controls to all filesystems and to your own and
 other people's abilities to never mess up a container.  As an example of
 the former, will you be able to create and configure a network interface
 or add iptables rules purely through fs interface? 

Well the fs interface for monitoring is pretty much on target.
As for iptables just get me a proper socket outside of the container
and I can control things.  (Pity we can't do plan 9 style binds of file 
descriptors the mount namespace).

 As an example of the
 latter, one little mistake and your container's mounts ns may no longer
 be a slave of yours or of /containers/c_22/root.  It might take you
 years to figure out that all the time when you were doing

   mount --bind /mnt/nas /containers/c_22/root/mnt/backup
   echo 1  /containers/c_22/root/root/backup-trigger
   read /containers/c_22/root/root/backup-callback
   umount /containers/c_22/root/mnt/backup

 your backups weren't going to your network storage but just being copied
 on local disk...

Yes, that could be nasty.

 BUT more importantly, it sounds like you are not interested in
 hijack_pid or hijack_cgroup, and Paul is only intersted in
 hijack_ns.  So noone will mind if we dump the other two?  It
 should greatly simplify the patch!

I don't expect so.  So far filesystem and file descriptor based
interfaces  I am confident that we can use outside of a container
(which really is most of everything), with our current infrastructure.

Doing it that way seems to provide more natural access controls.

So I am mostly interested in some way to get a magic login shell
inside a chroot with a filedescriptor that I have passed for
my input and output.  Make it a unix domain socket and I can
pass all of the filedescriptors I want in out of the little world.

I like the concept of using something like sys_hijack for that,
rather then ptrace, it can be a lot less of a hack.

I will come back to this and look a bit more once we have the pid
and network namespaces in decent shape.  Thanks for keeping the
idea alive.

Eric

-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)

2007-11-29 Thread Eric W. Biederman
Mark Nelson [EMAIL PROTECTED] writes:

 Hi Paul and Eric,

 Do you guys have any objections to dropping the hijack_pid() and
 hijack_cgroup() parts of sys_hijack, leaving just hijack_ns() (see
 below for discussion)?

I need to step back and study what is being proposed.

My gut feeling is that you are proposing something that does not
support forking me a process inside a container so I can have a
shell without having to run a login program.

There is a reason I proposed ptrace as an initial prototype.

All of the other uses of enter in a namespace context I feel confident
we can support by just having proper virtual filesystems available
to processes outside of the container.  For monitoring and control.

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-08 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Eric W. Biederman ([EMAIL PROTECTED]):
 
 Perform the split up you talked about above and move the table
 matching into the LSM hooks.
 
 Use something like the iptables action and match to module mapping
 code so we can have multiple modules compiled in and useable at the
 same time with the LSM hooks.
 
 I think it is firmly established that selling SElinux to everyone is
 politically untenable.  However enhancing the LSM (even if it is
 mostly selinux code movement down a layer) I think can be sold.
 
 If I could run Serge's isolation code and selinux rules at the same
 time that would be interesting. 

 But given that namespaces are making it upstream, what else is to be
 gained from the bsdail module?  What exactly are you looking for?

Good question.  I keep tripping over the LSM hooks, and I have the
distinct impression that part of the current contention and lack of
agreement is simply the way things are current factored.  So I'm
putting for a constructive suggestion that has the possibility of
going somewhere.

 1. are you looking to cover all the corner cases - i.e. prevent killing
 a process in another namespace through F_SETOWN or mqueue, etc?

I'm looking towards this yes.  There are times when we deliberately
allow mixing of things by the definition of what namespaces are and
there are some use cases where people don't want this.

 2. are you looking for a potentially easier fix to the current absence
 of isolation in the user namespace?

No.  I'm not even worrying about the user namespace until it resembles
complete.  Currently I just view it as a stub because as is, the
security namespace is pretty much useless for any case I think about.
We still have way to many cases where the kernel treats different
names as the same name.

 3. are you just generally looking to make lsm/selinux easier for
 yourself to configure?

Well.  I'm trying to make the LSM more useful to hack on and configure,
and much less contentions for ordinary people to use.

There is one issue with sockets that has come up where there are
people who really want to filter things at connect and bind time.
The LSM is so inflexible the only sane suggestion at the time was
to duplicate the LSM hooks and add an new iptable style table
for making that decision.

Also I'm thinking towards what do we have to do isolate the security
module stuff in the context of a namespace.  So that a person in
a container can setup their own rules that further restrict the
system.

So far I'm not ready to do anything yet but I'm keeping a weather eye
on the situation so I have a clue what I'm go.

 If 1, an selinux policy should cover you.  So you can then skip to 3.
 Or, alternatively, I do plan - as soon as my free time clears up a bit -
 on demonstrating how to write some selinux policy to create a secure
 container based on current -mm + your experimental network namespace
 patches.

Thanks that sounds interesting.

 If 3, then selinux policy modules may actually help you, else either
 a new LSM (maybe like LIDS) or a userspace tool which is a front-end to
 selinux policy, emulating the iptables rules formats, may be what you
 want?

I don't want to have to choose my LSM at compile time.  I want to
add support into the kernel at compile time and be able to configure
it before I go multi-user.  I know this kind of architecture is
achievable because iptables allows it.

When I conceive as the security modules as just a firewall between
applications on my own box I think, oh yeah this is no big deal,
I might want to limit something that way some time.  These are just
some additional rules on when to return -EPERM.  So I ask myself why
is this situation much less flexible and much harder to use then our
network firewall code?

 My impression is that selinux is one monolithic blob that doesn't
 allow me to incrementally add matching or action features that I
 find interesting.

 Actually with policy modules it gets much much better.  I have in fact
 been able to pretty easily write a short policy module to, say, create
 an selinux user which ran as root and had full access to the system to
 do system setup for automated testing.  There is a learning curve in
 having to look at existing modules for maybe a few days to get started,
 but once you get started the policy modules do make it very easy to
 add to current policy.

Ok. Interesting.  Are these kernel modules?

Still while I get the general impression that selinux seems to be
very close to a generic solution, and that selinux more or less has
the architecture we might want.  I don't get the impression that
selinux does this at a level that is open to other people doing
interesting things.

So I still ask the question can we move this functionality down to
the LSM in a way that will solve the composition problem between
multiple security modules?

It really seems to me that the LSM as currently structured creates
a large barrier to entry

Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-08 Thread Eric W. Biederman
Casey Schaufler [EMAIL PROTECTED] writes:

 --- Eric W. Biederman [EMAIL PROTECTED] wrote:


 Likely.  Until we have a generalized LSM interface with 1000 config
 options like netfilter I don't expect we will have grounds to talk
 or agree to a common user space interface.  Although I could be
 wrong.

 Gulp. I know that many of you are granularity advocates, but I
 have to say that security derived by tweeking 1000 knobs so that
 they are all just right seems a little far fetched to me. I see
 it as poopooing the 3rd and most important part of the reference
 monitor concept, small enough to analyze. Sure, you can analyse
 the 1000 individual checks, but you'll never be able to describe
 the system behavior as a whole.

Agreed.  I wasn't thinking 1000 individual checks but 1000 different
capabilities, could be either checks or actions, basically fundamental
different capabilities.  Things like CIPSO, or the ability to store a
security label on a file.  I would not expect most security policies
to use most of them.  Neither do I expect Orange book security to
necessarily be what people want to achieve with the LSM.   But I
haven't looked at it enough detail to know how things should be
factored, in this case I was simply extrapolating from the iptables
experience where  we do have a very large number of options.

The real point being is that I would be surprised if we could come
to an agreement of a common user space API when we can't agree on how
to compile all of the security modules into the kernel and have them
play nice with each other. 

Assuming we can achieve security modules playing nice with each other
using a mechanism similar to iptables, then what needs to be evaluated
is the specific table configuration we are using on the system, not
the full general set of possibilities.  Further I expect that for the
truly security paranoid we want the option to disable further table
changes after the tables have been configured.

On another side personally I don't see where the idea comes from that
you can describe system behavior as a whole without analyzing the
entire kernel.  Has there been work on a sparse like tool that I'm
not aware of to ensure the we always perform the appropriate security
checks on the user/kernel interface boundary?

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-08 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Eric W. Biederman ([EMAIL PROTECTED]):
 It really seems to me that the LSM as currently structured creates
 a large barrier to entry for people who have just this little thing
 they want to do that is not possible with any existing security
 module.

 Yes and it's been made increasingly so far particularly because of the
 perceived potential for 'abuse'.  So to be curt, allowing people like
 you describe to do something small and interesting is deemed far less
 important than making sure that the small thing they want to do fits
 within the LSM mandate and is not a non-upstream module.

 So that is the concern you would need to address before any other.

 Still, I do think that selinux policy modules may do just what you want.
 The main obstacle appears to be that the 'base' policy is so huge that
 it's tough to get started to do something small.

 You also might want to check out LIDS, as its rules are set up pretty
 much the way you seem to want.

To be very clear.  Enhancing the LSM is of interest to me as it looks
like that is a way to get people working and playing well together,
and that ultimately to be able to run a full distro in a container
I'm going to need this ability.

Examples of better ways to do this in selinux, LIDS, or SMACK are only
interesting as far as they suggest how to enhance the LSM.

I honestly think enhancing the LSM would actually reduce it's ability
to be abused, because nothing would directly own the hook.

My very practical question:  How do I run selinux in one container,
and SMACK in another?

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-08 Thread Eric W. Biederman
Casey Schaufler [EMAIL PROTECTED] writes:

 --- Eric W. Biederman [EMAIL PROTECTED] wrote:

 It really seems to me that the LSM as currently structured creates
 a large barrier to entry for people who have just this little thing
 they want to do that is not possible with any existing security
 module.

 I honestly think that the barrier has been more political
 in nature than technical. I don't know how long you've been
 watching, but no attempt to get an LSM upstream has escaped
 exagerated cricism from certain factions. Only someone who wants
 to get cut to metaphorical ribbons would submit a little LSM.
 Maybe that will get better now. I sure hope so.

Yes.  Me to.  I certainly agree about the political part.

My only hope was to suggest something that my reduce what there is to
get political about.

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel

2007-10-05 Thread Eric W. Biederman
Stephen Smalley [EMAIL PROTECTED] writes:

 On Fri, 2007-10-05 at 09:27 -0700, Casey Schaufler wrote:
 --- Kyle Moffett [EMAIL PROTECTED] wrote:
 
  On Oct 05, 2007, at 00:45:17, Eric W. Biederman wrote:
   Kyle Moffett [EMAIL PROTECTED] writes:
  
   On Oct 04, 2007, at 21:44:02, Eric W. Biederman wrote:

   Yes. Currently with containers we are taking that one step farther  
   as that solves a wider set of problems.

  So containers are exclusive subsets of the system while LSM should  
  be about non-exclusive information restriction.
 
 Yes. Isolation is a much simpler problem than access control.

Yes.  Simple isolation is a different and simpler problem that can be
solved with the LSM hooks today.  I brought it up for the contrast in
what the LSM hooks can be useful for.  Hopefully allowing the LSM
hooks to be perceived as something other then just hacks for selinux.

Using a security module for isolation is currently uninteresting
because it would preclude use of a security module like selinux or
smack, because we can have at most one security module at a time
loaded.

I have seen several other places where a custom LSM would have
been a good solution but because we don't allow composition solving
a little problem with the LSm is not interesting enough to allow
the code to be merged.

So I see the current structure of the LSM hooks as hindering
development.

  
   I think it is firmly established that selling SElinux to everyone  
   is politically untenable.  However enhancing the LSM (even if it is  
   mostly selinux code movement down a layer) I think can be sold.
 
 That would be silly. Smack uses a significantly smaller set of hooks
 than SELinux requires and still does interesting things. We went through
 the replace LSM with the SELinux interface exercise a couple years
 ago, I would hate to have to regurgitate all those discussions.

 I don't think Eric is proposing replacing LSM with the SELinux interface
 as it exists today, but rather making LSM more Netfilter-like and
 radically refactoring SELinux (and any other security module) to consist
 of a chain of smaller modules that are more general and reusable, and
 that can be composed and applied in interesting ways via an
 iptables-like interface.  I'm not sure what that would look like
 exactly, but it seems reasonable to explore.

Exactly refactoring security modules into small simple reusable chunks
to allow reuse.  It might look something like selinux chains or it
might not.  Inherently it needs to expose what you can do at the
existing hook points, and it needs to allow usage by different modules
that are compiled in at the same time.

It is certainly the case that you would not need to use all of the
existing hooks to get something done.

 One of the things left unresolved with LSM is userland API, and it does
 involve more than just returning EPERM or EACCES to applications.  You
 already have patched ls and sshd programs, and have acknowledged the
 need for more userland modifications to ultimately achieve your own
 goals.  If LSM is going to succeed in the kernel, then ultimately you
 need some common API for userland so that you don't need separate
 versions of ls, ps, sshd, etc for Smack vs SELinux vs. whatever.

Likely.  Until we have a generalized LSM interface with 1000 config
options like netfilter I don't expect we will have grounds to talk
or agree to a common user space interface.  Although I could be
wrong.

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-25 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting H. Peter Anvin ([EMAIL PROTECTED]):
 Miklos Szeredi wrote:
  
  Andrew, please skip this patch, for now.
  
  Serge found a problem with the fsuid approach: setfsuid(nonzero) will
  remove filesystem related capabilities.  So even if root is trying to
  set the user=UID flag on a mount, access to the target (and in case
  of bind, the source) is checked with user privileges.
  
  Root should be able to set this flag on any mountpoint, _regardless_
  of permissions.
  
 
 Right, if you're using fsuid != 0, you're not running as root 

 Sure, but what I'm not clear on is why, if I've done a
 prctl(PR_SET_KEEPCAPS, 1) before the setfsuid, I still lose the
 CAP_FS_MASK perms.  I see the special case handling in
 cap_task_post_setuid().  I'm sure there was a reason for it, but
 this is a piece of the capability implementation I don't understand
 right now.

So we drop CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH,
CAP_FOWNER, and CAP_FSETID

Since we are checking CAP_SETUID or CAP_SYS_ADMIN how is that
a problem?

Are there other permission checks that mount is doing that we
care about.


 (fsuid is
 the equivalent to euid for the filesystem.)

 If it were really the equivalent then I could keep my capabilities :)
 after changing it.

We drop all capabilities after we change the euid.

 I fail to see how ruid should have *any* impact on mount(2).  That seems
 to be a design flaw.

 May be, but just using fsuid at this point stops me from enabling user
 mounts under /share if /share is chmod 000 (which it is).

I'm dense today.  If we can't work out the details we can always use a flag.
But what is the problem with fsuid?

You are not trying to test this using a non-default security model are you?


Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] unprivileged mounts update

2007-04-25 Thread Eric W. Biederman
Serge E. Hallyn [EMAIL PROTECTED] writes:

 Quoting Eric W. Biederman ([EMAIL PROTECTED]):
 
 Are there other permission checks that mount is doing that we
 care about.

 Not mount itself, but in looking up /share/fa/root/home/fa,
 user fa doesn't have the rights to read /share, and by setting
 fsuid to fa and dropping CAP_DAC_READ_SEARCH the mount action fails.

Got it. 

I'm not certain this is actually a problem it may be a feature.
But it does fly in the face of the general principle of just
getting out of roots way so things can get done.

I think we can solve your basic problem by simply doing like:
chdir(/share); mount(.);  To simply avoid the permission problem.

The practical question is how much do we care.

 But the solution you outlined in your previous post would work around
 this perfectly.

If we are not using usual permissions which user do we use current-uid?
Or do we pass that user someplace?

  If it were really the equivalent then I could keep my capabilities :)
  after changing it.
 
 We drop all capabilities after we change the euid.

 Not if we've done prctl(PR_SET_KEEPCAPS, 1)

Ah cap_clear doesn't do the obvious thing.

Eric
-
To unsubscribe from this list: send the line unsubscribe 
linux-security-module in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html