[ Mark, apologies, this list is subscriber-post only. But I wanted to CC you to acknowledge your input. ]
On Thu, Dec 17, 2009 at 10:12:30AM +0000, Robin Bowes wrote: > On 17/12/09 09:30, Per Jessen wrote: > > Robin Bowes wrote: > > > >> I'm actually using OpenNMS, so yes, I have monitoring software in > >> place. However, it was my understanding that LVS CPU usage didn't > >> register on normal monitoring tools, ie. "top" doesn't show LVS CPU > >> usage. If I am mistaken, then please tell me as that will answer my > >> question! > > > > LVS isn't a process, it isn't scheduled by the dispatcher. You might as > > well be asking how much CPU your network routing and packet forwarding > > takes. > > That's pretty much my problem - LVS isn't a process, so how do I monitor it? > > Maybe CPU is the wrong metric but throwing 500,000,000 connections/day > through a server has *got* to have some impact on its resources and I'm > looking for the right way to monitor how the server is coping. > > All suggestions gratefully received! Hi Robin, a recent email from Mark Bergsma caused me to rediscover oprofile. Its a tool for tracking the CPU usage of things both inside and outside the kernel. And that covers LVS. Here is what I have been doing recently to profile LVS. Apologies for any errors in the descriptions of the commands, I'm a bit rusty on oprofile. # Reset oprofile, otherwise data previously collected may be # merged into this run opcontrol --reset # Start profiling opcontrol --start # Do something interesting, like throw lots of connections at LVS # I've been using http://www.linuxvirtualserver.org/julian/testlvs-0.1.tar.gz # Real traffic would also work # Stop profiling opcontrol --stop # See which "apps" are used more than 1% of CPU time opreport -t 1 Overflow stats not available CPU: Core 2, speed 2825.97 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 19744 69.1268 ip_vs 7696 26.9449 vmlinux 507 1.7751 e1000e 201 0.7037 bash 172 0.6022 libc-2.9.so 91 0.3186 ld-2.9.so 77 0.2696 oprofiled 13 0.0455 oprofile 12 0.0420 ext3 9 0.0315 jbd 6 0.0210 libncurses.so.5.7 6 0.0210 processor 6 0.0210 gawk 3 0.0105 libpam.so.0.81.12 3 0.0105 sudo # Get some detail about functions that used more than # 1% of the CPU time that was used by the top three "apps". # # vmlinux is the kernel. ip_vs, e1000e, oprofile and processor are # kernel modules (though that isn't obvious from the output). # Everything else on that list is user-space (unless I missed soemthing). # # I need -p to tell oprofile where modules are, your mileage may vary there. # vmlinux needs to be in the working directory opreport -p /lib/modules/$(uname -r)/kernel/ -l -t0.1 ip_vs vmlinux e1000e Overflow stats not available CPU: Core 2, speed 2825.97 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % app name symbol name 19157 68.5476 ip_vs.ko __ip_vs_conn_in_get 1010 3.6140 vmlinux ip_route_input 783 2.8017 vmlinux rt_worker_func 511 1.8285 vmlinux read_hpet 508 1.8177 vmlinux clflush_cache_range 347 1.2416 vmlinux free_block 309 1.1057 ip_vs.ko ip_vs_conn_out_get 305 1.0914 vmlinux iommu_flush_write_buffer I think we've found the elephant in the room! In this case the connection hash is being overloaded. It only has 4096 buckets and there are 1,000,000 entries. As Mark observed, increasing the tab bits from 12 to 18 and thus the buckets from 2096 to ~250,000 helps a lot here. _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
