Hi,
if this question is just stupid and therefore not worthy your time just
please tell me so...
I was trying to find some reason in the behavior...i was reading 82599
datasheet and
reasons for rx_missed_errors as wrote in previous mail
1) insufficient buffers allocate
2) insufficient bandwidth on the IO bus
why it does not make sense to me? Follows how I understand it...
2)
NICs are connected to the PCIe 2.0 bus. I used second nic and connected it
to second IO hub (different PCIe slot) with no perforamnce impact.
*
1) Is it possible NIC does not make it to store incoming packets into host
memory? That DMA cannot store more than 4 Mpps from RX FIFO into host
memory??*
why do i ask that?
Controller writes back the receive descriptor immediately following the
packet write into
system memory. When there are no free descriptors further packets might be
either dropped
or further RX FIFO may be disabled - ok - this is what is happening...
RX FIFO buffer per port is 512kB. That gives NIC space to buffer 8192 64B
packets.
Each one of 16 queues can buffer 512 64B packets. Maximum throughput is
approx 14 Mpps
that gives us approx 900 000 pps per queue. The receive DMA stores this
packet from RX FIFO into system memory to the location equal to the
appropriate host memory ring.
Rx descriptor buffer can hold by default up to 512 packets. Limiting factor
for reception
is only ring descriptor buffer. So i increased it by:
ethtool -C eth0 rx 4096
But that does not infl uence performance therefore bottleneck is probably
not directly insu cient space in ring buffers in the memory.
Thanks
Radim
On Thu, May 19, 2011 at 12:31 PM, [email protected] <
[email protected]> wrote:
> Is there any way how to profile what is driver doing on low level
> ..oprofile is probably too highlevel although it could help.
>
> I've got through source code that rx_missed_errors is counter that sums up
> RXMPC stats register....reasons mentioned there are
>
> 1) insufficient buffers allocate
> 2) insufficient bandwidth on the IO bus
>
> ad 1) ethtool -G eth0 rx 4096 does not solve this issue at all - if i
> understand this should increase rx ring buffers 8 times => then there is
> problem how often DMA stores these buffers into operating memory - but i
> cannot modify any other coalesce parametrs to see if that can be helped..
>
> ad 2) no idea how to see utilization of PCI-express bus :/...google did not
> help with this :). I've got supermicro x8dah motherboard with enough PCI-E
> 2.0 slots...intel card is plugged into x8 lane slot:
> LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
> BWMgmt- ABWMgmt-
>
> Unidirectional bandwidth should be 32Gbps - lets say pci-e protocol has 20%
> overhead that gives us still much more than 20Gbps for one direction =>
> 40Gbps fullduplex.
>
> I guess we dont have to discuss if QPI is sufficient for one 10GE port :)
>
> thanks for any hints...its just killing me ;)
>
> Radim
>
>
> On Thu, May 19, 2011 at 1:02 AM, [email protected] <
> [email protected]> wrote:
>
>> Hi,
>>
>> I already dont know where to ask...so I will try it here :)...my problem
>> is called *rx_missed_errors* :), i've spent days trying to tune it
>> somehow but still with no success.
>>
>> I've got 2 pretty nice computers - NUMA - 2x Xeon 5620 (quad-core)...2x
>> dual port 10GE NIC - Intel 82599 controller..
>>
>> Lets imagine very simple scenario:
>>
>> generator - sink
>>
>> where generator and switch are computers running Linux 2.6.39 with 3.3.9
>> ixgbe driver
>>
>> Using pktgen i generate 64B packets...lets say 10 Mpps - receiving port at
>> sink.
>>
>> I generated 100M pacekts.
>>
>> smp affinity is configured
>> flow control is off
>>
>> At sink check ethtool -S eth0
>>
>> NIC statistics:
>> rx_packets: 68097737
>> rx_missed_errors: 31902263
>> rx_pkts_nic: 68097737
>>
>> Received packets are nicely balanced between 16 Rx queues...but 31M
>> packets is lost. CPUs are idle 90% times (you can check attached
>> mpstat-rx.txt)
>>
>> I wanted to tune a bit interrupt coalesce - but ethtool -C eth0 does not
>> allow me to set anything else than rx-usecs - i've increased it but with no
>> luck.
>>
>> So my questions are:
>>
>> 1) is there any way how to tune interrupt moderation?
>>
>> 2) Am I missing something?? I would expect that since all cores are mostly
>> idle there should be a way how to tune driver so it actually can perform
>> well even under heavy load with 64B packets.
>>
>> 3) Another scenario is generator - (eth0)bridge - sink....in this case
>> there is 84% packet loss!! at the receiving interface and CPU cores are
>> still mostly idle (90%)
>>
>>
>> Please if you could help me a bit...I would be very happy :). Its almost
>> matter of life..
>>
>> Thanks
>> Radim
>>
>>
>>
>>
>>
>> --
>> Radim Roška
>
>
>
>
> --
> Radim Roška
>
--
Radim Roška
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired