Random curiosity: Why would jumbo frames increases replies per sec? Regards KK
On 15 December 2010 11:45, Amin Tootoonchian <a...@cs.toronto.edu> wrote: > I missed that. The single core throughput is ~250k replies/sec, two > cores ~450k replies/sec, three cores ~650k replies/sec, four cores > ~800 replies/sec. These numbers are higher than what I reported in my > previous post. That is most probably because, right now, I am testing > with MTU 9000 (jumbo frames) and with more user-space threads. > > Cheers, > Amin > > On Wed, Dec 15, 2010 at 12:36 AM, Martin Casado <cas...@nicira.com> wrote: >> Also, do you mind posting the single core throughput? >> >>> [cross-posting to nox-dev, openflow-discuss, ovs-discuss] >>> >>> I have prepared a patch based on NOX Zaku that improves its >>> performance by a factor of>10. This implies that a single controller >>> instance can run a large network with near a million flow initiations >>> per second. I am writing to open up a discussion and get feedback from >>> the community. >>> >>> Here are some preliminary results: >>> >>> - Benchmark configuration: >>> * Benchmark: Throughput test of cbench (controller benchmarker) with >>> 64 switches. Cbench is a part of the OFlops package >>> (http://www.openflowswitch.org/wk/index.php/Oflops). Under throughput >>> mode, cbench sends a batch of ofp_packet_in messages to the controller >>> and counts the number of replies it gets back. >>> * Benchmarker machine: HP ProLiant DL320 equipped with a 2.13GHz >>> quad-core Intel Xeon processor (X3210), and 4GB RAM >>> * Controller machine: Dell PowerEdge 1950 equipped with two 2.00GHz >>> quad-core Intel Xeon processor (E5405), and 4GB RAM >>> * Connectivity: 1Gbps >>> >>> - Benchmark results: >>> * NOX Zaku: ~60k replies/sec (NOX Zaku only utilizes a single core). >>> * Patched NOX: ~650k replies/sec (utilizing only 4 cores out of 8 >>> available cores). The sustained controller->benchmarker throughput is >>> ~400Mbps. >>> >>> The patch updates the asynchronous harness of NOX to a standard >>> library (boost asynchronous I/O library) which simplifies the code >>> base. It fixes the code in several areas, including but not limited >>> to: >>> >>> - Multi-threading: The patch enables having any number of worker >>> threads running on multiple cores. >>> >>> - Batching: Serving requests individually and sending replies one by >>> one is quite inefficient. The patch tries to batch requests together >>> were possible, as well replies (which reduces the number of system >>> calls significantly). >>> >>> - Memory allocation: The standard C++ memory allocator is not robust >>> in multi-threaded environments. Google's Thread-Caching Malloc >>> (TCMalloc) or Hoard memory allocator perform much better for NOX. >>> >>> - Fully asynchronous operation: The patched version avoids wasting CPU >>> cycles polling sockets, or event/timer dispatchers when not necessary. >>> >>> I would like to add that the patched version should perform much >>> better than what I reported above (the number reported is with a run >>> on 4 CPU cores). I guess a single NOX instance running on a machine >>> with 8 CPU cores should handle well above 1 million flow initiation >>> requests per second. Also having a more capable machine should help to >>> serve more requests! The code will be made available soon and I will >>> post updates as well. >>> >>> >>> Cheers, >>> Amin >>> _______________________________________________ >>> openflow-discuss mailing list >>> openflow-disc...@lists.stanford.edu >>> https://mailman.stanford.edu/mailman/listinfo/openflow-discuss >> >> > > _______________________________________________ > nox-dev mailing list > nox-dev@noxrepo.org > http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org > _______________________________________________ nox-dev mailing list nox-dev@noxrepo.org http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org