Hey Radim, Sorry for the late response I've been swamped lately and just haven't gotten a chance to get to your email.
First thing that jumps to mind for me is whether you NIC's are getting full x8 PCIe connection on your bus. This is easy enough to check out, if you haven't already, by looking in the system log when the driver is loaded or a lspci dump so we could see your bus configuration. Another point of interested would be the complete output from ethtool -S. Are the rx_missed_errors the only stat of note? Assuming that isn't the issue it's worth noting that rx_missed_errors are normal when a NAPI driver is getting packets faster than the system can handle. In this case increasing the FIFO buffer will only slightly increase the time before you start dropping packets again. If you wanted to experiment you could turn off interrupt moderation by loading the driver with "insmod ixgbe.ko InterruptThrottleRate=0". Thanks, -Don <[email protected]> >-----Original Message----- >From: [email protected] [mailto:[email protected]] >Sent: Sunday, May 22, 2011 11:19 AM >To: [email protected] >Subject: Re: [E1000-devel] ixgbe 3.3.9 - NIC 82599 - significant packet >lost > >Hi, > >if this question is just stupid and therefore not worthy your time just >please tell me so... > >I was trying to find some reason in the behavior...i was reading 82599 >datasheet and > >reasons for rx_missed_errors as wrote in previous mail >1) insufficient buffers allocate >2) insufficient bandwidth on the IO bus > >why it does not make sense to me? Follows how I understand it... > > >2) >NICs are connected to the PCIe 2.0 bus. I used second nic and connected >it >to second IO hub (different PCIe slot) with no perforamnce impact. >* >1) Is it possible NIC does not make it to store incoming packets into >host >memory? That DMA cannot store more than 4 Mpps from RX FIFO into host >memory??* > >why do i ask that? >Controller writes back the receive descriptor immediately following the >packet write into >system memory. When there are no free descriptors further packets might >be >either dropped >or further RX FIFO may be disabled - ok - this is what is happening... > >RX FIFO buffer per port is 512kB. That gives NIC space to buffer 8192 >64B >packets. >Each one of 16 queues can buffer 512 64B packets. Maximum throughput is >approx 14 Mpps >that gives us approx 900 000 pps per queue. The receive DMA stores this >packet from RX FIFO into system memory to the location equal to the >appropriate host memory ring. >Rx descriptor buffer can hold by default up to 512 packets. Limiting >factor >for reception >is only ring descriptor buffer. So i increased it by: >ethtool -C eth0 rx 4096 >But that does not infl uence performance therefore bottleneck is >probably >not directly insu cient space in ring buffers in the memory. > >Thanks > >Radim > > >On Thu, May 19, 2011 at 12:31 PM, [email protected] < >[email protected]> wrote: > >> Is there any way how to profile what is driver doing on low level >> ..oprofile is probably too highlevel although it could help. >> >> I've got through source code that rx_missed_errors is counter that >sums up >> RXMPC stats register....reasons mentioned there are >> >> 1) insufficient buffers allocate >> 2) insufficient bandwidth on the IO bus >> >> ad 1) ethtool -G eth0 rx 4096 does not solve this issue at all - if i >> understand this should increase rx ring buffers 8 times => then there >is >> problem how often DMA stores these buffers into operating memory - but >i >> cannot modify any other coalesce parametrs to see if that can be >helped.. >> >> ad 2) no idea how to see utilization of PCI-express bus :/...google >did not >> help with this :). I've got supermicro x8dah motherboard with enough >PCI-E >> 2.0 slots...intel card is plugged into x8 lane slot: >> LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ >DLActive- >> BWMgmt- ABWMgmt- >> >> Unidirectional bandwidth should be 32Gbps - lets say pci-e protocol >has 20% >> overhead that gives us still much more than 20Gbps for one direction >=> >> 40Gbps fullduplex. >> >> I guess we dont have to discuss if QPI is sufficient for one 10GE port >:) >> >> thanks for any hints...its just killing me ;) >> >> Radim >> >> >> On Thu, May 19, 2011 at 1:02 AM, [email protected] < >> [email protected]> wrote: >> >>> Hi, >>> >>> I already dont know where to ask...so I will try it here :)...my >problem >>> is called *rx_missed_errors* :), i've spent days trying to tune it >>> somehow but still with no success. >>> >>> I've got 2 pretty nice computers - NUMA - 2x Xeon 5620 (quad- >core)...2x >>> dual port 10GE NIC - Intel 82599 controller.. >>> >>> Lets imagine very simple scenario: >>> >>> generator - sink >>> >>> where generator and switch are computers running Linux 2.6.39 with >3.3.9 >>> ixgbe driver >>> >>> Using pktgen i generate 64B packets...lets say 10 Mpps - receiving >port at >>> sink. >>> >>> I generated 100M pacekts. >>> >>> smp affinity is configured >>> flow control is off >>> >>> At sink check ethtool -S eth0 >>> >>> NIC statistics: >>> rx_packets: 68097737 >>> rx_missed_errors: 31902263 >>> rx_pkts_nic: 68097737 >>> >>> Received packets are nicely balanced between 16 Rx queues...but 31M >>> packets is lost. CPUs are idle 90% times (you can check attached >>> mpstat-rx.txt) >>> >>> I wanted to tune a bit interrupt coalesce - but ethtool -C eth0 does >not >>> allow me to set anything else than rx-usecs - i've increased it but >with no >>> luck. >>> >>> So my questions are: >>> >>> 1) is there any way how to tune interrupt moderation? >>> >>> 2) Am I missing something?? I would expect that since all cores are >mostly >>> idle there should be a way how to tune driver so it actually can >perform >>> well even under heavy load with 64B packets. >>> >>> 3) Another scenario is generator - (eth0)bridge - sink....in this >case >>> there is 84% packet loss!! at the receiving interface and CPU cores >are >>> still mostly idle (90%) >>> >>> >>> Please if you could help me a bit...I would be very happy :). Its >almost >>> matter of life.. >>> >>> Thanks >>> Radim >>> >>> >>> >>> >>> >>> -- >>> Radim Roška >> >> >> >> >> -- >> Radim Roška >> > > > >-- >Radim Roška ------------------------------------------------------------------------------ vRanger cuts backup time in half-while increasing security. With the market-leading solution for virtual backup and recovery, you get blazing-fast, flexible, and affordable data protection. Download your free trial now. http://p.sf.net/sfu/quest-d2dcopy1 _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
