Re: [Lxc-users] Containers slow to start after 1600

Serge Hallyn Mon, 11 Mar 2013 10:43:35 -0700

Quoting Benoit Lourdelet (blour...@juniper.net):
> Hello,
> 
> I am running LXC 0.8.0 kernel 3.7.9 and try to start more than 1000 small 
> containers : around 10MB of RAM per containers.
> 
> Starting around the first 1600 happens smoothy - I have a 32 virtual core 
> machine - but then everything gets very slow :
> 
> up to a minute per contain creation.  Ultimately the server CPU goes to 100%.
> 
> I get this error  multiple time in  the syslog :
> 
> 
> [ 2402.961711] INFO: task lxc-start:128486 blocked for more than 120 seconds.
> [ 2402.961717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [ 2402.961724] lxc-start D ffffffff8180cc60 0 128486 1 0x00000000
> [ 2402.961727] ffff883c30359cb0 0000000000000086 ffff883c2ea3c800 
> ffff883c2f550600
> [ 2402.961734] ffff883c2d955c00 ffff883c30359fd8 ffff883c30359fd8 
> ffff883c30359fd8
> [ 2402.961741] ffff881fd35e5c00 ffff883c2d955c00 ffff883c3533ec10 
> ffffffff81cac4e0
> [ 2402.961747] Call Trace:
> [ 2402.961753] [<ffffffff816dbfc9>] schedule+0x29/0x70
> [ 2402.961758] [<ffffffff816dc27e>] schedule_preempt_disabled+0xe/0x10
> [ 2402.961763] [<ffffffff816dadd7>] __mutex_lock_slowpath+0xd7/0x150
> [ 2402.961768] [<ffffffff8158b911>] ? net_alloc_generic+0x21/0x30
> [ 2402.961772] [<ffffffff816da9ea>] mutex_lock+0x2a/0x50
> [ 2402.961777] [<ffffffff8158c044>] copy_net_ns+0x84/0x110
> [ 2402.961782] [<ffffffff81081f4b>] create_new_namespaces+0xdb/0x180
> [ 2402.961787] [<ffffffff8108210c>] copy_namespaces+0x8c/0xd0
> [ 2402.961792] [<ffffffff81055ea0>] copy_process+0x970/0x1550
> [ 2402.961796] [<ffffffff8119e542>] ? do_filp_open+0x42/0xa0
> [ 2402.961801] [<ffffffff81056bc9>] do_fork+0xf9/0x340
> [ 2402.961806] [<ffffffff81199de6>] ? final_putname+0x26/0x50
> [ 2402.961811] [<ffffffff81199ff9>] ? putname+0x29/0x40
> [ 2402.961816] [<ffffffff8101d498>] sys_clone+0x28/0x30
> [ 2402.961819] [<ffffffff816e5c23>] stub_clone+0x13/0x20
> [ 2402.961823] [<ffffffff816e5919>] ? system_call_fastpath+0x16/0x1b


Interesting.  It could of course be some funky cache or hash issue, but
what does /proc/meminfo show?  10M ram per container may be true in
userspace, but the network stacks etc are also taking up kernel memory.

I assume the above trace is one container waiting on another to finish
it's netns alloc.  If you could get dmesg output from echo t >
/proc/sysrq-trigger during one of these slow starts it could show where
the other is hung.

-serge

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users

Re: [Lxc-users] Containers slow to start after 1600

Reply via email to