On Tue, 2013-04-16 at 06:16 +0000, Xavier Trilla wrote:
> Hi,
> 
> This is the first time I post here because I like to find solutions by 
> myself. But this  time I'm running out of ideas. (Well. The reality is that 
> we are running out of time, as at some point our boss will run out of 
> patience if we don't manage to deliver some results :P )
> 
> Our problem is that we are not able at all to replicate the performance we 
> got with a specific kernel one of my colleagues build once. Actually he build 
> that kernel not paying much attention to the options he was using (It was a 
> "fast and dirty" build. and now we are paying the consequences!) and it seems 
> he was extremely lucky (or inspired) that day, as we cannot reproduce the 
> performance that specific kernel delivers. 
> 
> So after 3 weeks running tests we decided that maybe it was about time to 
> ask, so here we are :) 
> 
> Ok, so let's begging with the test lab we have setup (I will give you some 
> hardware details, but keep in mind that one kernel delivers about 3x 
> performance than others with the exact same HW configuration):
> 
> Router: 
> MB: SuperMicro X8DTN+F
> CPUs: 2 x Xeon 5620
> LANs: Integrated Intel 82576 Dual-Port Gigabit Ethernet Controller
> HyperThreading Disabled
> IGB Driver load parameters: IntMode=2 InterruptThrottleRate=0,0 
> QueuePairs=0,0 RSS=4,4
> IRQ Balance Disabled (SMP affinity changed for RSS queues)
> All queues bind to the second CPU (One RSS queue of each adapter bind to each 
> core)
> Rp_filter Disabled
> Ip_forwarding Enabled
> Iptables modules are NOT loaded
> Machine is just doing IP forwarding across two interfaces
> 
> And basically all the rest is almost default, as we wanted to remove as many 
> variables as possible.
> 
> Receivers/Generators: 
> Xeon 5620 machines using Bonesi as Packet generator (UDP 64 with 50k source 
> addresses) 
> 
> And here comes the interesting part, in this scenario using kernel 2.6.32.27 
> with igb driver 4.1.2 we manage to get around 1.5 Mpps but with all other 
> kernels we tried the maximum we get is less than 750 Kpps. So far we tried 
> with kernels 3.0.73, 3.2.43 and 3.4.40. (We still need to try with 2.6.34.14 
> and we are solving a problem with 2.6.32.60 because it doesn't boot. probably 
> is a problem related to our LSI raid controller) and no success.
> 
> While investigating about this issue (Keep in mind that we are more 
> networking/sysadmin guys. And yes, we may have a quite good knowledge of 
> linux, but we are really far away from you guys when it comes to the kernel 
> and networking drivers. ) the only way we managed to find a difference has 
> been using "perf top" on the machine while using the 2.6.32.27 and other 3.X 
> kernels, and the main difference we found has been:
> 
> Kernel 2.6.32.27: 
> 
> The top consuming function is "igb_poll": As I understand as the network is 
> under heavy load the kernel stats operating the interface in NAPI polling 
> mode, so everything seems to be normal and performance is really good.
> 
> Kernel 3.4.40 (We have seen similar behavious on other kernels)
> 
> Here things look completely different, and _raw_sping_lock_irqsave is 
> consuming the 58% of the resources. (Quite big, isn't it?)
> 
> With my really low understanding I guess this is a process that spins and 
> that might be the reason for the performance difference among kernels. But as 
> _raw_sping_lock_irqsave it's a commonly call function we are not close at all 
> to identifiying the real reason of the performance degradation and how to 
> avoid it)
> 
> So, does anybody have an any idea about why we see this massive difference in 
> performance? (Or at least an idea that could lead us to the answer...)
> 
> And few more questions (Just in case nobody knows the answer to the previous 
> question) : 
> 
> - Do you think it is kernel or driver related? (we realized igb driver 
> configures itself depending on the kernel version, so we are not sure)
> - Any extremely important parameters when compiling the kernel we might me 
> forgetting? 
> - Any documentation you consider we should read? (BTW, we have seen Intel 
> results when forgarding packets with Nehalem CPUs... But some information 
> about how do you achieve  those astoning results would be really apreciated 
> :))
> 
> Thanks for your time!

Use "perf record -a -g sleep 10 ; perf report" instead of "perf top" :
we'll catch the call graphs.





------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to