Quoting Benoit Lourdelet (blour...@juniper.net): > Hello, > > I am running LXC 0.8.0 kernel 3.7.9 and try to start more than 1000 small > containers : around 10MB of RAM per containers. > > Starting around the first 1600 happens smoothy - I have a 32 virtual core > machine - but then everything gets very slow : > > up to a minute per contain creation. Ultimately the server CPU goes to 100%. > > I get this error multiple time in the syslog : > > > [ 2402.961711] INFO: task lxc-start:128486 blocked for more than 120 seconds. > [ 2402.961717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 2402.961724] lxc-start D ffffffff8180cc60 0 128486 1 0x00000000 > [ 2402.961727] ffff883c30359cb0 0000000000000086 ffff883c2ea3c800 > ffff883c2f550600 > [ 2402.961734] ffff883c2d955c00 ffff883c30359fd8 ffff883c30359fd8 > ffff883c30359fd8 > [ 2402.961741] ffff881fd35e5c00 ffff883c2d955c00 ffff883c3533ec10 > ffffffff81cac4e0 > [ 2402.961747] Call Trace: > [ 2402.961753] [<ffffffff816dbfc9>] schedule+0x29/0x70 > [ 2402.961758] [<ffffffff816dc27e>] schedule_preempt_disabled+0xe/0x10 > [ 2402.961763] [<ffffffff816dadd7>] __mutex_lock_slowpath+0xd7/0x150 > [ 2402.961768] [<ffffffff8158b911>] ? net_alloc_generic+0x21/0x30 > [ 2402.961772] [<ffffffff816da9ea>] mutex_lock+0x2a/0x50 > [ 2402.961777] [<ffffffff8158c044>] copy_net_ns+0x84/0x110 > [ 2402.961782] [<ffffffff81081f4b>] create_new_namespaces+0xdb/0x180 > [ 2402.961787] [<ffffffff8108210c>] copy_namespaces+0x8c/0xd0 > [ 2402.961792] [<ffffffff81055ea0>] copy_process+0x970/0x1550 > [ 2402.961796] [<ffffffff8119e542>] ? do_filp_open+0x42/0xa0 > [ 2402.961801] [<ffffffff81056bc9>] do_fork+0xf9/0x340 > [ 2402.961806] [<ffffffff81199de6>] ? final_putname+0x26/0x50 > [ 2402.961811] [<ffffffff81199ff9>] ? putname+0x29/0x40 > [ 2402.961816] [<ffffffff8101d498>] sys_clone+0x28/0x30 > [ 2402.961819] [<ffffffff816e5c23>] stub_clone+0x13/0x20 > [ 2402.961823] [<ffffffff816e5919>] ? system_call_fastpath+0x16/0x1b
Interesting. It could of course be some funky cache or hash issue, but what does /proc/meminfo show? 10M ram per container may be true in userspace, but the network stacks etc are also taking up kernel memory. I assume the above trace is one container waiting on another to finish it's netns alloc. If you could get dmesg output from echo t > /proc/sysrq-trigger during one of these slow starts it could show where the other is hung. -serge ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users