Hello Serge,
I am running on a 256MB RAM host, with plenty of free memory. I issue echo t > /proc/sysrq-trigger when containers was taking 30s to start , it gave the following. Nothing that caught my attention. this block is repeated of each running container: [46825.718046] rt_rq[31]:/lxc/lwb2002 [46825.718048] .rt_nr_running : 0 [46825.718050] .rt_throttled : 0 [46825.718052] .rt_time : 0.000000 [46825.718053] .rt_runtime : 0.000000 then : [46825.718056] [46825.718056] rt_rq[31]:/lxc [46825.718059] .rt_nr_running : 0 [46825.718060] .rt_throttled : 0 [46825.718062] .rt_time : 0.000000 [46825.718064] .rt_runtime : 0.000000 [46825.718069] [46825.718069] rt_rq[31]:/libvirt/lxc [46825.718071] .rt_nr_running : 0 [46825.718073] .rt_throttled : 0 [46825.718075] .rt_time : 0.000000 [46825.718077] .rt_runtime : 0.000000 [46825.718080] [46825.718080] rt_rq[31]:/libvirt/qemu [46825.718083] .rt_nr_running : 0 [46825.718084] .rt_throttled : 0 [46825.718086] .rt_time : 0.000000 [46825.718088] .rt_runtime : 0.000000 [46825.718091] [46825.718091] rt_rq[31]:/libvirt [46825.718093] .rt_nr_running : 0 [46825.718095] .rt_throttled : 0 [46825.718097] .rt_time : 0.000000 [46825.718099] .rt_runtime : 0.000000 [46825.718105] [46825.718105] rt_rq[31]:/ [46825.718107] .rt_nr_running : 0 [46825.718109] .rt_throttled : 0 [46825.718111] .rt_time : 0.000000 [46825.718113] .rt_runtime : 950.000000 [46825.718115] [46825.718115] runnable tasks: [46825.718115] task PID tree-key switches prio exec-runtime sum-exec sum-sleep [46825.718115] ---------------------------------------------------------------------------------------------------------- [46825.727356] regards Benoit root@ieng-serv06:/root/scripts# cat /proc/meminfo MemTotal: 264124804 kB MemFree: 234107144 kB Buffers: 3429676 kB Cached: 1650712 kB SwapCached: 0 kB Active: 10496560 kB Inactive: 3224732 kB Active(anon): 8695932 kB Inactive(anon): 84348 kB Active(file): 1800628 kB Inactive(file): 3140384 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 136 kB Writeback: 0 kB AnonPages: 8640928 kB Mapped: 17868 kB Shmem: 139380 kB Slab: 10287240 kB SReclaimable: 5977640 kB SUnreclaim: 4309600 kB KernelStack: 312000 kB PageTables: 1989464 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 132062400 kB Committed_AS: 76627724 kB VmallocTotal: 34359738367 kB VmallocUsed: 1117304 kB VmallocChunk: 34222330512 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 206416 kB DirectMap2M: 5003264 kB DirectMap1G: 263192576 kB On 11 Mar 2013, at 18:41, Serge Hallyn wrote: > Quoting Benoit Lourdelet (blour...@juniper.net): >> Hello, >> >> I am running LXC 0.8.0 kernel 3.7.9 and try to start more than 1000 small >> containers : around 10MB of RAM per containers. >> >> Starting around the first 1600 happens smoothy - I have a 32 virtual core >> machine - but then everything gets very slow : >> >> up to a minute per contain creation. Ultimately the server CPU goes to 100%. >> >> I get this error multiple time in the syslog : >> >> >> [ 2402.961711] INFO: task lxc-start:128486 blocked for more than 120 seconds. >> [ 2402.961717] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [ 2402.961724] lxc-start D ffffffff8180cc60 0 128486 1 0x00000000 >> [ 2402.961727] ffff883c30359cb0 0000000000000086 ffff883c2ea3c800 >> ffff883c2f550600 >> [ 2402.961734] ffff883c2d955c00 ffff883c30359fd8 ffff883c30359fd8 >> ffff883c30359fd8 >> [ 2402.961741] ffff881fd35e5c00 ffff883c2d955c00 ffff883c3533ec10 >> ffffffff81cac4e0 >> [ 2402.961747] Call Trace: >> [ 2402.961753] [<ffffffff816dbfc9>] schedule+0x29/0x70 >> [ 2402.961758] [<ffffffff816dc27e>] schedule_preempt_disabled+0xe/0x10 >> [ 2402.961763] [<ffffffff816dadd7>] __mutex_lock_slowpath+0xd7/0x150 >> [ 2402.961768] [<ffffffff8158b911>] ? net_alloc_generic+0x21/0x30 >> [ 2402.961772] [<ffffffff816da9ea>] mutex_lock+0x2a/0x50 >> [ 2402.961777] [<ffffffff8158c044>] copy_net_ns+0x84/0x110 >> [ 2402.961782] [<ffffffff81081f4b>] create_new_namespaces+0xdb/0x180 >> [ 2402.961787] [<ffffffff8108210c>] copy_namespaces+0x8c/0xd0 >> [ 2402.961792] [<ffffffff81055ea0>] copy_process+0x970/0x1550 >> [ 2402.961796] [<ffffffff8119e542>] ? do_filp_open+0x42/0xa0 >> [ 2402.961801] [<ffffffff81056bc9>] do_fork+0xf9/0x340 >> [ 2402.961806] [<ffffffff81199de6>] ? final_putname+0x26/0x50 >> [ 2402.961811] [<ffffffff81199ff9>] ? putname+0x29/0x40 >> [ 2402.961816] [<ffffffff8101d498>] sys_clone+0x28/0x30 >> [ 2402.961819] [<ffffffff816e5c23>] stub_clone+0x13/0x20 >> [ 2402.961823] [<ffffffff816e5919>] ? system_call_fastpath+0x16/0x1b > > Interesting. It could of course be some funky cache or hash issue, but > what does /proc/meminfo show? 10M ram per container may be true in > userspace, but the network stacks etc are also taking up kernel memory. > > I assume the above trace is one container waiting on another to finish > it's netns alloc. If you could get dmesg output from echo t > > /proc/sysrq-trigger during one of these slow starts it could show where > the other is hung. > > -serge > ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Lxc-users mailing list Lxc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-users