On 2013/5/24 20:49, Serge Hallyn wrote:
> Quoting Qiang Huang (h.huangqi...@huawei.com):
>> Hi,
>>
>> I found a tricky problem in LXC, once I made a mistake in config, set
>>
>> lxc.cgroup.cpuset.cpus = -1
>>
>> ofcourse start would fail, but then "lxc-ls --active" showed the container
>> is active.
>>
>> error message is:
>> # lxc-start -n hq111 -f config_hq -l TRACE
>> lxc-start: Invalid argument - write /cgroup/lxc/hq111/cpuset.cpus : Invalid 
>> argument
>> lxc-start: Error setting cpuset.cpus to -1 for lxc/hq111
>>
>> lxc-start: failed to setup the cgroups for 'hq111'
>> lxc-start: failed to spawn 'hq111'
>> lxc-start: Device or resource busy - failed to remove cgroup 
>> '/cgroup/lxc/hq111'
>>
>>
>> This is not hard to reproduce, just keep trying, not stable though.
>> Then I read through the code and figured recursive_rmdir() failed, rmdir() 
>> return
>> -1 sometimes, any idea how to fix this?
> 
> Could you tell us exactly which version this is, and exactly how you
> created the container?  When I do it in ubuntu saucy (roughly 0.9.0 lxc),
> the cgroup gets correctly removed.
> 

Hi Serge,

I think I have found the reason, when setup_cgroup() fail, the child process
may still exist when the father try to destroy cgroup.(We have no sync mechanism
to ensure child can exit before father when something wrong happen)

commit 6031a6e5f939bda07d98768d34dafae677a7dfeb
Author: Dwight Engen <dwight.en...@oracle.com>
Date:   Wed May 15 12:27:34 2013 -0400

    set non device cgroup items before the cgroup is entered

    This allows some special cgroup items such as memory.kmem.limit_in_bytes
    to be successfully set, since they must be set before any task is put
    into the cgroup.

    The devices cgroup is setup later giving the container a chance to mount
    file systems before the device it might want to mount from becomes
    unavailable.

    Signed-off-by: Dwight Engen <dwight.en...@oracle.com>
    Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com>

This patch moved setup_cgroup() before lxc_cgroup_enter(), when setup_cgroup()
fail, there is no task in cgroup, so remove cgroup wouldn't fail.

So my problem no longer exists on the latest code, but there are still
potential problems if we don't ensure child exit before father, such as
Michael's problem, might also caused by this.





------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
_______________________________________________
Lxc-devel mailing list
Lxc-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-devel

Reply via email to