On Tue, 2013-04-16 at 06:16 +0000, Xavier Trilla wrote: > Hi, > > This is the first time I post here because I like to find solutions by > myself. But this time I'm running out of ideas. (Well. The reality is that > we are running out of time, as at some point our boss will run out of > patience if we don't manage to deliver some results :P ) > > Our problem is that we are not able at all to replicate the performance we > got with a specific kernel one of my colleagues build once. Actually he build > that kernel not paying much attention to the options he was using (It was a > "fast and dirty" build. and now we are paying the consequences!) and it seems > he was extremely lucky (or inspired) that day, as we cannot reproduce the > performance that specific kernel delivers. > > So after 3 weeks running tests we decided that maybe it was about time to > ask, so here we are :) > > Ok, so let's begging with the test lab we have setup (I will give you some > hardware details, but keep in mind that one kernel delivers about 3x > performance than others with the exact same HW configuration): > > Router: > MB: SuperMicro X8DTN+F > CPUs: 2 x Xeon 5620 > LANs: Integrated Intel 82576 Dual-Port Gigabit Ethernet Controller > HyperThreading Disabled > IGB Driver load parameters: IntMode=2 InterruptThrottleRate=0,0 > QueuePairs=0,0 RSS=4,4 > IRQ Balance Disabled (SMP affinity changed for RSS queues) > All queues bind to the second CPU (One RSS queue of each adapter bind to each > core) > Rp_filter Disabled > Ip_forwarding Enabled > Iptables modules are NOT loaded > Machine is just doing IP forwarding across two interfaces > > And basically all the rest is almost default, as we wanted to remove as many > variables as possible. > > Receivers/Generators: > Xeon 5620 machines using Bonesi as Packet generator (UDP 64 with 50k source > addresses) > > And here comes the interesting part, in this scenario using kernel 2.6.32.27 > with igb driver 4.1.2 we manage to get around 1.5 Mpps but with all other > kernels we tried the maximum we get is less than 750 Kpps. So far we tried > with kernels 3.0.73, 3.2.43 and 3.4.40. (We still need to try with 2.6.34.14 > and we are solving a problem with 2.6.32.60 because it doesn't boot. probably > is a problem related to our LSI raid controller) and no success. > > While investigating about this issue (Keep in mind that we are more > networking/sysadmin guys. And yes, we may have a quite good knowledge of > linux, but we are really far away from you guys when it comes to the kernel > and networking drivers. ) the only way we managed to find a difference has > been using "perf top" on the machine while using the 2.6.32.27 and other 3.X > kernels, and the main difference we found has been: > > Kernel 2.6.32.27: > > The top consuming function is "igb_poll": As I understand as the network is > under heavy load the kernel stats operating the interface in NAPI polling > mode, so everything seems to be normal and performance is really good. > > Kernel 3.4.40 (We have seen similar behavious on other kernels) > > Here things look completely different, and _raw_sping_lock_irqsave is > consuming the 58% of the resources. (Quite big, isn't it?) > > With my really low understanding I guess this is a process that spins and > that might be the reason for the performance difference among kernels. But as > _raw_sping_lock_irqsave it's a commonly call function we are not close at all > to identifiying the real reason of the performance degradation and how to > avoid it) > > So, does anybody have an any idea about why we see this massive difference in > performance? (Or at least an idea that could lead us to the answer...) > > And few more questions (Just in case nobody knows the answer to the previous > question) : > > - Do you think it is kernel or driver related? (we realized igb driver > configures itself depending on the kernel version, so we are not sure) > - Any extremely important parameters when compiling the kernel we might me > forgetting? > - Any documentation you consider we should read? (BTW, we have seen Intel > results when forgarding packets with Nehalem CPUs... But some information > about how do you achieve those astoning results would be really apreciated > :)) > > Thanks for your time!
Use "perf record -a -g sleep 10 ; perf report" instead of "perf top" : we'll catch the call graphs. ------------------------------------------------------------------------------ Precog is a next-generation analytics platform capable of advanced analytics on semi-structured data. The platform includes APIs for building apps and a phenomenal toolset for data science. Developers can use our toolset for easy data analysis & visualization. Get a free account! http://www2.precog.com/precogplatform/slashdotnewsletter _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
