On 11/20/2012 05:05 PM, Ben Greear wrote: > On 11/20/2012 04:34 PM, Ben Greear wrote: >> On 11/20/2012 04:18 PM, Ben Greear wrote: > >>>> Also, have you checked to make sure the feature set is comparable? >>>> For >>>> instance the E5 can support VT-d. If that is enabled it can have a >>>> negative impact on I/O performance due to extra locking overhead for >>>> map/unmap calls on the host. >>> >>> I'll go poke around the BIOS and disable the VT-d if I can find it. >> >> Wow, disabling VT-d gives a big improvement! >> >> It now runs around 9.3Gbps bi-directional. Still not as good as >> our E3 or i7 systems, but it's at least closer. >> >> Here's the new perf top >> >> Samples: 24K of event 'cycles', Event count (approx.): 15591201274 >> 10.61% [ixgbe] [k] ixgbe_poll >> 6.76% [pktgen] [k] pktgen_thread_worker >> 6.40% [kernel] [k] timekeeping_get_ns >> 5.46% [ixgbe] [k] ixgbe_xmit_frame_ring >> 4.20% libc-2.15.so [.] __memcpy_ssse3_back >> 3.98% [kernel] [k] do_raw_spin_unlock >> 3.02% [kernel] [k] skb_copy_bits >> 2.99% [kernel] [k] build_skb >> 2.61% perf-2510.map [.] 0x00007f73b2a28476 >> 2.56% [kernel] [k] __netif_receive_skb >> >> >> What CPU(s) do you suggest for high network bandwidth..hopefully >> pci-e gen3 systems that can push beyond 2 10G NICs at full speed? > > Well..I stepped away for a bit, and when I came back, it is now working > very nicely. Can do full 10G tx+tx on two ports, even with pktgen using > multi-skb at 0 (ie, no skb cloning). > > And, bridging delay-emulator app of ours, which previously peaked at > 6Gbps or so bi-directional > on i7/E3 systems runs 9GB+ even with full 1 second of one-way delay > (ie, cold cache on the skbs). > > So, I don't know why, aside from the VT-d, it was acting poorly > earlier, but > all seems good now. Maybe 'updatedb' or something similar was running > and > I didn't notice.... > > I'll keep poking at this..and should have a 4-port 10G with gen-3 pcie > coming soon to play with. > > Thanks, > Ben
Based on the trace you provided earlier it was all VT-d. When allocating DMA resources with the Intel IOMMU there is a walk through a spinlock protected red-black tree that is required. The problem is that it doesn't scale with multiple queues. I have seen it cause significant issues on systems with multiple cores. Any of the newer Xeon E5 systems will have an advantage for network workloads due to DDIO. Specifically you should find that the I/O will scale quite well due to the fact that the memory bandwidth will not be much of a bottleneck as long as the physical device, interrupts, and traffic generator are all on the same CPU socket. Thanks, Alex ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
