Re: [PATCH] userns/capability: Add user namespace capability
Andy Lutomirskiwrites: > At the risk of pointing out a can of worms, the attack surface also > includes things like the iptables configuration APIs, parsers, and > filter/conntrack/action modules. It is worth noting that module auto-load does not happen if the triggering code does not have the proper permissions in the initial user namespace. I agree that is another piece of code that should be counted. How that compares to the other 130,000 or so lines of code in the network stack an unprivileged user can caused to be exercised already I don't know. In my back of the napkin swag I had totally forgotten to count anything in the network stack. A lot of the netfilter code that I have read and looked at is compartively simple and clean so I don't expect there is much risk except from sheer volume of code there. It is also tricky to count because the entire network side of the networking stack is exposed to hostile users on the internet so anything except the configuration is already exposed to hostile users. The average check entry is 15-20 lines long. There appear to be 117 unique check entry functions in the kernel so there may be another 2.5k lines of code there. Hmm. And we have not had any design issues with the network stack. Absent of design issues where the code even when implemented correctly has the wrong semantics, we are left with the probability of exploitable buggy code. I suspect we have enough code even without user namespaces enabled that the probability of exploitable buggy code someone in the code that unprivilged users can cause to be exercised run is > 50%. I wonder if there are any good statistical models that give realistic estimates of those things. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] fs: Verify access of user towards block device file when mounting
Mike Snitzerwrites: > What layer establishes access rights to historically root-only > priviledged block devices? Is it user namespaces? Block devices are weird. Mounts historically have not checked the permissions on the block devices because a mounter has CAP_SYS_ADMIN. Unprivileged users are allowes to read/write block devices if someone has given them permissions on the device node in the filesystem. The thinking with this patchset is to start performing the normal block device access permission checks when mounting filesystems when the mounter does not have the global CAP_SYS_ADMIN permission. The truth is we are not much past the point of realizing that there were no permission checks to use the actual block device passed in to mount, so we could still be missing something. There is a lot going on with dm, md, and lvm. I don't know if the model of just look at the block device inode and perform the permission checks is good enough. > I haven't kept up with user namespaces as it relates to stacking block > drivers like DM. But I'm happy to come up to speed and at the same time > help you verify all works as expected with DM blocks devices... We are just getting there. But if you can help that would be great. The primary concern with dm is what happens when unprivileged users get ahold of the code, and what happens when evil users corrupt the on-disk format. In principle dm like loop should be safe to use if there are not bugs that make it unsafe for unprivileged users to access the code. The goal if possible is to run things like docker without needed to be root or even more fun to run docker in a container, and in general enable nested containers. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-security-module" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Mark Nelson [EMAIL PROTECTED] writes: Hi Paul and Eric, Do you guys have any objections to dropping the hijack_pid() and hijack_cgroup() parts of sys_hijack, leaving just hijack_ns() (see below for discussion)? I need to step back and study what is being proposed. My gut feeling is that you are proposing something that does not support forking me a process inside a container so I can have a shell without having to run a login program. Hmm, depends on exactly what you want, but you may be right. In terms of namespaces it'll be in the target container, including having a pid in the container. Yes, which is generally what you want for a magic login shell. The most dangerous part about the purely ptrace method you mention is that pieces of the ptraced process' environment may leak, pollute, and attack your new process. But it shouldn't be impossible to do it safely. Just tedious. Yes. It is that use case more then anything I am concerned with. There is a reason I proposed ptrace as an initial prototype. All of the other uses of enter in a namespace context I feel confident we can support by just having proper virtual filesystems available to processes outside of the container. For monitoring and control. I think you're showing an unhealthy amount of trust in both our ability to provide full fs-based controls to all filesystems and to your own and other people's abilities to never mess up a container. As an example of the former, will you be able to create and configure a network interface or add iptables rules purely through fs interface? Well the fs interface for monitoring is pretty much on target. As for iptables just get me a proper socket outside of the container and I can control things. (Pity we can't do plan 9 style binds of file descriptors the mount namespace). As an example of the latter, one little mistake and your container's mounts ns may no longer be a slave of yours or of /containers/c_22/root. It might take you years to figure out that all the time when you were doing mount --bind /mnt/nas /containers/c_22/root/mnt/backup echo 1 /containers/c_22/root/root/backup-trigger read /containers/c_22/root/root/backup-callback umount /containers/c_22/root/mnt/backup your backups weren't going to your network storage but just being copied on local disk... Yes, that could be nasty. BUT more importantly, it sounds like you are not interested in hijack_pid or hijack_cgroup, and Paul is only intersted in hijack_ns. So noone will mind if we dump the other two? It should greatly simplify the patch! I don't expect so. So far filesystem and file descriptor based interfaces I am confident that we can use outside of a container (which really is most of everything), with our current infrastructure. Doing it that way seems to provide more natural access controls. So I am mostly interested in some way to get a magic login shell inside a chroot with a filedescriptor that I have passed for my input and output. Make it a unix domain socket and I can pass all of the filedescriptors I want in out of the little world. I like the concept of using something like sys_hijack for that, rather then ptrace, it can be a lot less of a hack. I will come back to this and look a bit more once we have the pid and network namespaces in decent shape. Thanks for keeping the idea alive. Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10)
Mark Nelson [EMAIL PROTECTED] writes: Hi Paul and Eric, Do you guys have any objections to dropping the hijack_pid() and hijack_cgroup() parts of sys_hijack, leaving just hijack_ns() (see below for discussion)? I need to step back and study what is being proposed. My gut feeling is that you are proposing something that does not support forking me a process inside a container so I can have a shell without having to run a login program. There is a reason I proposed ptrace as an initial prototype. All of the other uses of enter in a namespace context I feel confident we can support by just having proper virtual filesystems available to processes outside of the container. For monitoring and control. Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Perform the split up you talked about above and move the table matching into the LSM hooks. Use something like the iptables action and match to module mapping code so we can have multiple modules compiled in and useable at the same time with the LSM hooks. I think it is firmly established that selling SElinux to everyone is politically untenable. However enhancing the LSM (even if it is mostly selinux code movement down a layer) I think can be sold. If I could run Serge's isolation code and selinux rules at the same time that would be interesting. But given that namespaces are making it upstream, what else is to be gained from the bsdail module? What exactly are you looking for? Good question. I keep tripping over the LSM hooks, and I have the distinct impression that part of the current contention and lack of agreement is simply the way things are current factored. So I'm putting for a constructive suggestion that has the possibility of going somewhere. 1. are you looking to cover all the corner cases - i.e. prevent killing a process in another namespace through F_SETOWN or mqueue, etc? I'm looking towards this yes. There are times when we deliberately allow mixing of things by the definition of what namespaces are and there are some use cases where people don't want this. 2. are you looking for a potentially easier fix to the current absence of isolation in the user namespace? No. I'm not even worrying about the user namespace until it resembles complete. Currently I just view it as a stub because as is, the security namespace is pretty much useless for any case I think about. We still have way to many cases where the kernel treats different names as the same name. 3. are you just generally looking to make lsm/selinux easier for yourself to configure? Well. I'm trying to make the LSM more useful to hack on and configure, and much less contentions for ordinary people to use. There is one issue with sockets that has come up where there are people who really want to filter things at connect and bind time. The LSM is so inflexible the only sane suggestion at the time was to duplicate the LSM hooks and add an new iptable style table for making that decision. Also I'm thinking towards what do we have to do isolate the security module stuff in the context of a namespace. So that a person in a container can setup their own rules that further restrict the system. So far I'm not ready to do anything yet but I'm keeping a weather eye on the situation so I have a clue what I'm go. If 1, an selinux policy should cover you. So you can then skip to 3. Or, alternatively, I do plan - as soon as my free time clears up a bit - on demonstrating how to write some selinux policy to create a secure container based on current -mm + your experimental network namespace patches. Thanks that sounds interesting. If 3, then selinux policy modules may actually help you, else either a new LSM (maybe like LIDS) or a userspace tool which is a front-end to selinux policy, emulating the iptables rules formats, may be what you want? I don't want to have to choose my LSM at compile time. I want to add support into the kernel at compile time and be able to configure it before I go multi-user. I know this kind of architecture is achievable because iptables allows it. When I conceive as the security modules as just a firewall between applications on my own box I think, oh yeah this is no big deal, I might want to limit something that way some time. These are just some additional rules on when to return -EPERM. So I ask myself why is this situation much less flexible and much harder to use then our network firewall code? My impression is that selinux is one monolithic blob that doesn't allow me to incrementally add matching or action features that I find interesting. Actually with policy modules it gets much much better. I have in fact been able to pretty easily write a short policy module to, say, create an selinux user which ran as root and had full access to the system to do system setup for automated testing. There is a learning curve in having to look at existing modules for maybe a few days to get started, but once you get started the policy modules do make it very easy to add to current policy. Ok. Interesting. Are these kernel modules? Still while I get the general impression that selinux seems to be very close to a generic solution, and that selinux more or less has the architecture we might want. I don't get the impression that selinux does this at a level that is open to other people doing interesting things. So I still ask the question can we move this functionality down to the LSM in a way that will solve the composition problem between multiple security modules? It really seems to me that the LSM as currently structured creates a large barrier to entry
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Casey Schaufler [EMAIL PROTECTED] writes: --- Eric W. Biederman [EMAIL PROTECTED] wrote: Likely. Until we have a generalized LSM interface with 1000 config options like netfilter I don't expect we will have grounds to talk or agree to a common user space interface. Although I could be wrong. Gulp. I know that many of you are granularity advocates, but I have to say that security derived by tweeking 1000 knobs so that they are all just right seems a little far fetched to me. I see it as poopooing the 3rd and most important part of the reference monitor concept, small enough to analyze. Sure, you can analyse the 1000 individual checks, but you'll never be able to describe the system behavior as a whole. Agreed. I wasn't thinking 1000 individual checks but 1000 different capabilities, could be either checks or actions, basically fundamental different capabilities. Things like CIPSO, or the ability to store a security label on a file. I would not expect most security policies to use most of them. Neither do I expect Orange book security to necessarily be what people want to achieve with the LSM. But I haven't looked at it enough detail to know how things should be factored, in this case I was simply extrapolating from the iptables experience where we do have a very large number of options. The real point being is that I would be surprised if we could come to an agreement of a common user space API when we can't agree on how to compile all of the security modules into the kernel and have them play nice with each other. Assuming we can achieve security modules playing nice with each other using a mechanism similar to iptables, then what needs to be evaluated is the specific table configuration we are using on the system, not the full general set of possibilities. Further I expect that for the truly security paranoid we want the option to disable further table changes after the tables have been configured. On another side personally I don't see where the idea comes from that you can describe system behavior as a whole without analyzing the entire kernel. Has there been work on a sparse like tool that I'm not aware of to ensure the we always perform the appropriate security checks on the user/kernel interface boundary? Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): It really seems to me that the LSM as currently structured creates a large barrier to entry for people who have just this little thing they want to do that is not possible with any existing security module. Yes and it's been made increasingly so far particularly because of the perceived potential for 'abuse'. So to be curt, allowing people like you describe to do something small and interesting is deemed far less important than making sure that the small thing they want to do fits within the LSM mandate and is not a non-upstream module. So that is the concern you would need to address before any other. Still, I do think that selinux policy modules may do just what you want. The main obstacle appears to be that the 'base' policy is so huge that it's tough to get started to do something small. You also might want to check out LIDS, as its rules are set up pretty much the way you seem to want. To be very clear. Enhancing the LSM is of interest to me as it looks like that is a way to get people working and playing well together, and that ultimately to be able to run a full distro in a container I'm going to need this ability. Examples of better ways to do this in selinux, LIDS, or SMACK are only interesting as far as they suggest how to enhance the LSM. I honestly think enhancing the LSM would actually reduce it's ability to be abused, because nothing would directly own the hook. My very practical question: How do I run selinux in one container, and SMACK in another? Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Casey Schaufler [EMAIL PROTECTED] writes: --- Eric W. Biederman [EMAIL PROTECTED] wrote: It really seems to me that the LSM as currently structured creates a large barrier to entry for people who have just this little thing they want to do that is not possible with any existing security module. I honestly think that the barrier has been more political in nature than technical. I don't know how long you've been watching, but no attempt to get an LSM upstream has escaped exagerated cricism from certain factions. Only someone who wants to get cut to metaphorical ribbons would submit a little LSM. Maybe that will get better now. I sure hope so. Yes. Me to. I certainly agree about the political part. My only hope was to suggest something that my reduce what there is to get political about. Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Version 3 (2.6.23-rc8) Smack: Simplified Mandatory Access Control Kernel
Stephen Smalley [EMAIL PROTECTED] writes: On Fri, 2007-10-05 at 09:27 -0700, Casey Schaufler wrote: --- Kyle Moffett [EMAIL PROTECTED] wrote: On Oct 05, 2007, at 00:45:17, Eric W. Biederman wrote: Kyle Moffett [EMAIL PROTECTED] writes: On Oct 04, 2007, at 21:44:02, Eric W. Biederman wrote: Yes. Currently with containers we are taking that one step farther as that solves a wider set of problems. So containers are exclusive subsets of the system while LSM should be about non-exclusive information restriction. Yes. Isolation is a much simpler problem than access control. Yes. Simple isolation is a different and simpler problem that can be solved with the LSM hooks today. I brought it up for the contrast in what the LSM hooks can be useful for. Hopefully allowing the LSM hooks to be perceived as something other then just hacks for selinux. Using a security module for isolation is currently uninteresting because it would preclude use of a security module like selinux or smack, because we can have at most one security module at a time loaded. I have seen several other places where a custom LSM would have been a good solution but because we don't allow composition solving a little problem with the LSm is not interesting enough to allow the code to be merged. So I see the current structure of the LSM hooks as hindering development. I think it is firmly established that selling SElinux to everyone is politically untenable. However enhancing the LSM (even if it is mostly selinux code movement down a layer) I think can be sold. That would be silly. Smack uses a significantly smaller set of hooks than SELinux requires and still does interesting things. We went through the replace LSM with the SELinux interface exercise a couple years ago, I would hate to have to regurgitate all those discussions. I don't think Eric is proposing replacing LSM with the SELinux interface as it exists today, but rather making LSM more Netfilter-like and radically refactoring SELinux (and any other security module) to consist of a chain of smaller modules that are more general and reusable, and that can be composed and applied in interesting ways via an iptables-like interface. I'm not sure what that would look like exactly, but it seems reasonable to explore. Exactly refactoring security modules into small simple reusable chunks to allow reuse. It might look something like selinux chains or it might not. Inherently it needs to expose what you can do at the existing hook points, and it needs to allow usage by different modules that are compiled in at the same time. It is certainly the case that you would not need to use all of the existing hooks to get something done. One of the things left unresolved with LSM is userland API, and it does involve more than just returning EPERM or EACCES to applications. You already have patched ls and sshd programs, and have acknowledged the need for more userland modifications to ultimately achieve your own goals. If LSM is going to succeed in the kernel, then ultimately you need some common API for userland so that you don't need separate versions of ls, ps, sshd, etc for Smack vs SELinux vs. whatever. Likely. Until we have a generalized LSM interface with 1000 config options like netfilter I don't expect we will have grounds to talk or agree to a common user space interface. Although I could be wrong. Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting H. Peter Anvin ([EMAIL PROTECTED]): Miklos Szeredi wrote: Andrew, please skip this patch, for now. Serge found a problem with the fsuid approach: setfsuid(nonzero) will remove filesystem related capabilities. So even if root is trying to set the user=UID flag on a mount, access to the target (and in case of bind, the source) is checked with user privileges. Root should be able to set this flag on any mountpoint, _regardless_ of permissions. Right, if you're using fsuid != 0, you're not running as root Sure, but what I'm not clear on is why, if I've done a prctl(PR_SET_KEEPCAPS, 1) before the setfsuid, I still lose the CAP_FS_MASK perms. I see the special case handling in cap_task_post_setuid(). I'm sure there was a reason for it, but this is a piece of the capability implementation I don't understand right now. So we drop CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, and CAP_FSETID Since we are checking CAP_SETUID or CAP_SYS_ADMIN how is that a problem? Are there other permission checks that mount is doing that we care about. (fsuid is the equivalent to euid for the filesystem.) If it were really the equivalent then I could keep my capabilities :) after changing it. We drop all capabilities after we change the euid. I fail to see how ruid should have *any* impact on mount(2). That seems to be a design flaw. May be, but just using fsuid at this point stops me from enabling user mounts under /share if /share is chmod 000 (which it is). I'm dense today. If we can't work out the details we can always use a flag. But what is the problem with fsuid? You are not trying to test this using a non-default security model are you? Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] unprivileged mounts update
Serge E. Hallyn [EMAIL PROTECTED] writes: Quoting Eric W. Biederman ([EMAIL PROTECTED]): Are there other permission checks that mount is doing that we care about. Not mount itself, but in looking up /share/fa/root/home/fa, user fa doesn't have the rights to read /share, and by setting fsuid to fa and dropping CAP_DAC_READ_SEARCH the mount action fails. Got it. I'm not certain this is actually a problem it may be a feature. But it does fly in the face of the general principle of just getting out of roots way so things can get done. I think we can solve your basic problem by simply doing like: chdir(/share); mount(.); To simply avoid the permission problem. The practical question is how much do we care. But the solution you outlined in your previous post would work around this perfectly. If we are not using usual permissions which user do we use current-uid? Or do we pass that user someplace? If it were really the equivalent then I could keep my capabilities :) after changing it. We drop all capabilities after we change the euid. Not if we've done prctl(PR_SET_KEEPCAPS, 1) Ah cap_clear doesn't do the obvious thing. Eric - To unsubscribe from this list: send the line unsubscribe linux-security-module in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html