On 04/30/2012 02:02 AM, Moris Bangoura wrote: > Hi, > > we are working with modified ixgbe drivers (packetshader - ixgbe > 2.0.38.2, netmap - ixgbe 3.9.15) that allow receiving/sending small > frames in wirespeed. > > In our lab we use 2 CPU NUMA architecture (Xeon CPU, Intel 5520 > chipset), 2x dual 10GbE 82599 cards. > > There is a problem with receiving small frames with length, that is not > multiply of 64B (without or with 4B CRC, depending if RDRXCTL.CRCStrip > and HLREG0.RXCRCSTRP register is set to 1 or 0). > > We suspect, that problem is somewhere in 82599 DMA engine, Intel 5520 > IOH, QPI or CPU cache line. > > What we discovered: > > 1. If CRCStrip reg is set to 1: > - RX of 60B(+4B CRC) frame is 6,9 Mpps (PCIe TLP payload is 60B) > - RX of 64B(+4B CRC) frame is 14,2 Mpps (PCIe TLP payload is 64B) -> OK, > wirespeed. > > 2. If CRCStrip reg is set to 0: > - RX of 60B(+4B CRC) is < 14,8 Mpps (PCIe TLP payload is 64B) -> OK, > wirespeed. > - RX of 61B(+4B CRC) is < 6,9 Mpps (PCIe TLP payload is 65B) > > Is there some possible workaround, so 82599 DMA engine always aligns > length of Memory Write Request payload to be multiply of 64B? > > Example: > 0. 64B frame is received on Rx MAC with CRCStrip reg set to 1. > 1. The receive DMA fetches the next RX descriptor from the appropriate > host memory ring to be used for the next > received packet. > 2. The receive DMA posts the packet appended with 4B (so Memory Write > Request payload length is multiply of 64B) to the location indicated by > the RX descriptor through the PCIe interface. > 3. When the packet is placed into host memory, the receive DMA updates > all the RX descriptor(s) that were used by the > packet data (real non-appended packet length is reported via PKT_LEN). > 4. The receive DMA writes back the RX descriptor content along with > status bits that indicate the packet information > including what offloads were done on that packet. > 5. 82599 initiates an interrupt indicating, that new packet is ready in > host memory. The host reads packet data (only PKT_LEN indicated by RX > descriptor). > > Maybe there is some 82599 RX DMA register/bit that is not covered by > 82599 datasheet (version 2.75). > > Regards, > Morris,
Are you seeing this issue with both the 2.0.38.2 and 3.9.15 drivers, or is this mainly with 2.0.38.2? I just want to clarify since the 3.9.15 driver should be significantly more optimized than 2.0.38.2 driver. The behaviour you are describing sounds like an issue with partial cache line writes. This is an issue for most architectures because it typically requires a read/modify/write cycle to write the cache line instead of being just a direct write as in the case of a full cache line write. The 3.9.15 driver contains several updates since the 2.0.38.2 in regards to partial cache line writes and will likely show much better performance. Specifically it will cut the number of partial cache line writes in half by aligning the buffers with the start of a cache line. The hardware itself doesn't contain any workarounds for this, but I would recommend testing with the 3.9.15 driver instead of the 2.0.38.2 driver as it will contain several software improvements that may help to improve the performance. Thanks, Alex ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
