On Feb 16, 2014, at 8:30 PM, Stéphane Graber <stgra...@ubuntu.com> wrote:

> On Sun, Feb 16, 2014 at 08:22:40PM -0500, Brian Campbell wrote:
>> 
>> On Feb 16, 2014, at 12:53 PM, Stéphane Graber <stgra...@ubuntu.com> wrote:
>> 
>>> On Sun, Feb 16, 2014 at 12:49:44PM -0500, Brian Campbell wrote:
>>>> On Feb 16, 2014, at 12:23 PM, Stéphane Graber <stgra...@ubuntu.com> wrote:
>>>> 
>>>>> On Sun, Feb 16, 2014 at 03:51:50AM -0500, Brian Campbell wrote:
>>>>>> I'm running Debian Jessie (testing), and compiled lxc from a fresh git 
>>>>>> clone (7da8ab1: close inherited fds when we still have proc mounted). I 
>>>>>> would like to create a user container without using root privileges, so 
>>>>>> I set up UID mappings such that my user ID would map to root within the 
>>>>>> container. From what I can tell, this is all that should be necessary to 
>>>>>> get it to use user namespaces to operate unprivileged:
>>>>>> 
>>>>>> lambda@gherkin:lxc$ cat ~/.config/lxc/default.conf
>>>>>> lxc.id_map = u 0 1000 9999
>>>>>> lxc.id_map = g 0 1000 9999
>>>>>> lambda@gherkin:lxc$ id
>>>>>> uid=1000(lambda) gid=1000(lambda) 
>>>>>> groups=1000(lambda),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),104(scanner),109(bluetooth),112(netdev),125(vboxusers)
>>>>> 
>>>>> From the above, it seems like you didn't configure /etc/subuid and
>>>>> /etc/subgid. Without those (and a version of the shadow package which
>>>>> supports them), you won't be able to switch to those UID ranges.
>>>> 
>>>> Nope, I haven't done anything with them, and it looks like Debian's passwd 
>>>> doesn't have subuid/subgid support. Taking a look at the Ubuntu changelog, 
>>>> it looks like they were added as a patch to the Ubuntu package in 
>>>> 1:4.1.5.1-1ubuntu5. Is there a Debian package already available for this, 
>>>> or should I try to extract the patches from the Ubuntu package and build 
>>>> my own?
>>>> 
>>>> Ah, looks like I should have read this: 
>>>> https://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/
>>>>  before trying this; all I had seen was 
>>>> https://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg05859.html 
>>>> which didn't mention anything about /etc/subuid and /etc/subgid.
>>> 
>>> The shadow change was submitted to Debian at the same time we pushed it
>>> to Ubuntu, but last I checked it was still in an unreleased git
>>> branch...
>> 
>> Ah, the Debian packaging is in Git. The package metadata still refers to 
>> Subversion, and I'd looked there but not seen any recent updates. But I've 
>> now tracked down the Debian git tree in 
>> git://git.debian.org/git/pkg-shadow/shadow. However, it looks like this git 
>> tree is based on an unreleased upstream version 4.2, and only contains the 
>> packaging changes, not the actual source changes.
>> 
>> After some further sleuthing, I discovered that there's a new upstream 
>> repository at https://github.com/shadow-maint/shadow
>> 
>> I was able to then build a new package by running "make dist" in the 
>> upstream repo to generate shadow-4.2.tar.bz2, then in the packaging repo 
>> using git-buildpackage 
>> 
>>    $ gbp import-orig ../shadow-4.2.tar.bz2 --no-pristine-tar --no-sign-tags
>>    $ gbp buildpackage -us -uc
>> 
>> Just documenting this here so that if anyone else finds this thread they'll 
>> be able to do the same.
>> 
>> However, even after installing the above, it still gives the same error. I 
>> tried rerunning configure and rebuilding lxc after installing the new shadow 
>> package in case its configuration depended on that existing, but that made 
>> no difference.
>> 
>>    lambda@gherkin:lxc$ grep lambda /etc/sub* 2>/dev/null
>>    /etc/subgid:lambda:100000:65536
>>    /etc/subuid:lambda:100000:65536
>>    lambda@gherkin:lxc$ cat ~/.config/lxc/default.conf
>>    lxc.id_map = u 0 100000 65536
>>    lxc.id_map = g 0 100000 65536
>>    lambda@gherkin:lxc$ lxc-create -l DEBUG -o lxc.log --name precise-test -t 
>> download -- -d ubuntu -r precise -a amd64
>>    unshare: Operation not permitted
>>    read pipe: No such file or directory
>>    lxc-create: Error chowning 
>> /home/lambda/.local/share/lxc/precise-test/rootfs to container root
>>    lxc-create: Error creating backing store type (none) for precise-test
>>    lxc-create: Error creating container precise-test
>>    lambda@gherkin:lxc$ cat lxc.log
>>        lxc-create 1392583412.774 WARN     lxc_log - lxc_log_init called with 
>> log already initialized
>>        lxc-create 1392583412.774 INFO     lxc_confile - read uid map: type u 
>> nsid 0 hostid 100000 range 65536
>>        lxc-create 1392583412.774 INFO     lxc_confile - read uid map: type g 
>> nsid 0 hostid 100000 range 65536
>>        lxc-create 1392583412.776 ERROR    lxc_container - Error chowning 
>> /home/lambda/.local/share/lxc/precise-test/rootfs to container root
>>        lxc-create 1392583412.776 ERROR    lxc_container - Error creating 
>> backing store type (none) for precise-test
>>        lxc-create 1392583412.776 ERROR    lxc_create_ui - Error creating 
>> container precise-test
>> 
>> I've tried stracing to see if I could get any more useful information. I 
>> think this is the right incantation; I need to run strace as root in order 
>> to be able to trace any suid root executables that lxc-create might call, 
>> but I need to set up the environment like mine since sudo clears it and run 
>> the actual process as my UID with strace -u in order to use the unprivileged 
>> code paths:
>> 
>>    lambda@gherkin:lxc$ sudo env HOME=/home/lambda 
>> LD_LIBRARY_PATH=/usr/local/lib strace -u lambda -v -tt -f -o lxc.trace 
>> lxc-create -l DEBUG -o lxc.log --name precise-test -t download -- -d ubuntu 
>> -r precise -a amd64
>> unshare: Operation not permitted
>> read pipe: No such file or directory
>> lxc-create: Error chowning /home/lambda/.local/share/lxc/precise-test/rootfs 
>> to container root
>> lxc-create: Error creating backing store type (none) for precise-test
>> lxc-create: Error creating container precise-test
>> 
>> The full trace is available at http://ephemera.continuation.org/lxc.trace; 
>> this looks like the relevant bit:
>> 
>> 6969  19:51:07.737140 pipe([3, 5])      = 0
>> 6969  19:51:07.737163 pipe([6, 7])      = 0
>> 6969  19:51:07.737186 clone(child_stack=0, 
>> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
>> child_tidptr=0x7f06b20f2b10) = 6970
>> 6970  19:51:07.737276 set_robust_list(0x7f06b20f2b20, 0x18 <unfinished ...>
>> 6969  19:51:07.737285 close(5 <unfinished ...>
>> 6970  19:51:07.737293 <... set_robust_list resumed> ) = 0
>> 6969  19:51:07.737300 <... close resumed> ) = 0
>> 6969  19:51:07.737312 close(6)          = 0
>> 6969  19:51:07.737334 read(3,  <unfinished ...>
>> 6970  19:51:07.737344 close(3)          = 0
>> 6970  19:51:07.737366 close(7)          = 0
>> 6970  19:51:07.737389 open("/dev/pts/13", O_RDWR|O_NONBLOCK) = 3
>> 6970  19:51:07.737421 fcntl(3, F_GETFL) = 0x8802 (flags 
>> O_RDWR|O_NONBLOCK|O_LARGEFILE)
>> 6970  19:51:07.737444 fcntl(3, F_SETFL, O_RDWR|O_LARGEFILE) = 0
>> 6970  19:51:07.737466 close(0)          = 0
>> 6970  19:51:07.737487 close(1)          = 0
>> 6970  19:51:07.737507 close(2)          = 0
>> 6970  19:51:07.737535 dup2(3, 0)        = 0
>> 6970  19:51:07.737557 dup2(3, 1)        = 1
>> 6970  19:51:07.737578 dup2(3, 2)        = 2
>> 6970  19:51:07.737599 close(3)          = 0
>> 6970  19:51:07.737623 unshare(CLONE_NEWNS|0x10000000) = -1 EPERM (Operation 
>> not permitted)
>> 6970  19:51:07.737649 dup(2)            = 3
>> 6970  19:51:07.737673 fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
>> 6970  19:51:07.737708 fstat(3, {st_dev=makedev(0, 11), st_ino=16, 
>> st_mode=S_IFCHR|0620, st_nlink=1, st_uid=1000, st_gid=5, st_blksize=1024, 
>> st_blocks=0, st_rdev=makedev(136, 13), st_atime=2014/02/16-19:51:04, 
>> st_mtime=2014/02/16-19:51:04, st_ctime=2014/02/16-15:34:12}) = 0
>> 6970  19:51:07.737745 mmap(NULL, 4096, PROT_READ|PROT_WRITE, 
>> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f06b2111000
>> 6970  19:51:07.737771 lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
>> 6970  19:51:07.737812 write(3, "unshare: Operation not permitted"..., 33) = 
>> 33
>> 6970  19:51:07.737849 close(3)          = 0
>> 6970  19:51:07.737871 munmap(0x7f06b2111000, 4096) = 0
>> 6970  19:51:07.737964 exit_group(1)     = ? 
>> 
>> 
>>> For unprivileged containers with current kernel and LXC (and a distro
>>> with the new shadow), there's also an article I wrote a little while
>>> back at:
>>> https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/
>> 
>> Thanks! I had been looking around to see if there was any documentation on 
>> this or an explanation for how it should work, but hadn't found that post. 
>> That's quite helpful.
>> 
>> I note from the above that you mention the following are necessary to get 
>> this to fully work:
>> 
>>>     • Kernel: 3.13 + a couple of staging patches (which Ubuntu has in its 
>>> kernel)
>>>     • User namespaces enabled in the kernel
>>>     • A very recent version of shadow that supports subuid/subgid
>>>     • Per-user cgroups on all controllers (which I turned on a couple of 
>>> weeks ago)
>>>     • LXC 1.0 beta2 or higher (released two days ago)
>>>     • A version of PAM with a loginuid patch that’s yet to be in any 
>>> released version
>> 
>> 
>> I have user namespaces enabled in the kernel, after finding and building the 
>> package I have the new version of shadow, and I'm building lxc from Git, so 
>> those bases should be covered. So I'm wondering if one of those other items 
>> is what I'm missing.
> 
> Did you install uidmap from that new version of shadow? It may be that
> you have support for the options and the configfile but not the setuid
> tools to actually set the uid ranges.
> 
> In Ubuntu this is done with two separate setuid tools (newuidmap and
> newgidmap) which are both contained in the uidmap package from that
> newer shadow source.
> 
> LXC in Ubuntu depends on this (well, Recommends technically) but if you
> built everything by hand, it's possible you somehow missed this.

Yes, I installed it after building it:

lambda@gherkin:lxc$ dpkg -l uidmap
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                               Version                Architecture      
     Description
+++-==================================-======================-======================-=========================================================================
ii  uidmap                             1:4.2-1                amd64             
     programs to help use subuids
lambda@gherkin:lxc$ which newuidmap
/usr/bin/newuidmap
lambda@gherkin:lxc$ which newgidmap
/usr/bin/newgidmap

>From the strace, it doesn't look like these are ever being called by 
>lxc-create. Taking a look through the source, they are supposed to be called 
>by lxc-usernsexec in the parent process after forking the child process and 
>the child process has called unshare(). The parent waits for the child to 
>write to the pipe before calling map_child_uids():

    close(pipe1[1]);
    close(pipe2[0]);
    if (read(pipe1[0], buf, 1) < 1) {
        perror("read pipe");
        exit(1);
    }

    buf[0] = '1';
    if (map_child_uids(pid, active_map)) {
        fprintf(stderr, "error mapping child\n");
        ret = 0;
    }
    if (write(pipe2[1], buf, 1) < 0) {
        perror("write to pipe");
        exit(1);
    }

And before the child writes to the pipe, it calls unshare(CLONE_NEWUSER | 
CLONE_NEWNS):

        // Child.

        close(pipe1[0]);
        close(pipe2[1]);
        opentty(ttyname);

        ret = unshare(flags);
        if (ret < 0) {
            perror("unshare");
            return 1;
        }
        buf[0] = '1';
        if (write(pipe1[1], buf, 1) < 1) {
            perror("write pipe");
            exit(1);
        }
        if (read(pipe2[0], buf, 1) < 1) {
            perror("read pipe");
            exit(1);
        }
        if (buf[0] != '1') {
            fprintf(stderr, "parent had an error, child exiting\n");
            exit(1);
        }

        close(pipe1[1]);
        close(pipe2[0]);
        return do_child((void*)argv);

So this is failing before it's even had a chance to map the UIDs/GIDs.

I tried the demo_userns.c example code from this LWN article 
https://lwn.net/Articles/532593/ and got the same result:

lambda@gherkin:userns$ ./demo_userns
clone: Operation not permitted

So it looks like something is preventing me from calling clone(CLONE_NEWUSER) 
or unshare(CLONE_NEWUSER).

I can't find any documentation on CLONE_NEWUSER outside of that LWN article, 
and it indicates that as of 3.8, no privilege should be needed to call 
clone(CLONE_NEWUSER), so I'm somewhat puzzled as to why this is failing.

>> 
>> I'm running 3.12, not 3.13, nor whatever patches Ubuntu has. Looking through 
>> the kernel history for anything mentioning "namespace" between 3.12 and 3.13 
>> it looks like there are some features relevant to networking but not to 
>> basic functionality like this. Is there anything I need from the Ubuntu 
>> patches? I also don't have the PAM loginuid patch mentioned, but looking at 
>> that it seems to only affect SSHing into a user container. Can you provide 
>> more details on the per-user cgroups on all controllers? What handles that 
>> on Ubuntu?
>> 
>> -- Brian
>> 
>> _______________________________________________
>> lxc-devel mailing list
>> lxc-devel@lists.linuxcontainers.org
>> http://lists.linuxcontainers.org/listinfo/lxc-devel
> 
> -- 
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com
> _______________________________________________
> lxc-devel mailing list
> lxc-devel@lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel

_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

Reply via email to