For curiosity sake, I just ran the script shown below on my machine and got 3.34 Mpps. I am using the latest Click from git (as of two days ago), and running the following CPUs: Intel(R) Xeon(R) CPU W3520 @ 2.67GH
It seems that 0.5Mpps is pretty low for an i7-870 CPU, but it does appear that the patches improved the performance significantly. InfiniteSource -> ctr::AverageCounter -> Queue -> Discard; Script( wait 60, print ctr.count, print ctr.byte_count, ); Roman On Fri, 1 Jul 2011 19:47:13 +0200 Luigi Rizzo <ri...@iet.unipi.it> wrote > If someone is interest in performance of userland click, i'd suggest > the following two patches and looking at netmap (i already discussed > what follows with Eddie, and i am hoping someone more fluent than > me in C++ can polish the code and add a support for thread-local lists). > > To get an idea of what you can get on a single core i7-870 CPU with > the stock version and with these patches: > > 1.8.0 With patches > InfiniteSource -> Discard 515Kpps 18.56Mpps > InfiniteSource -> Queue -> Discard 500Kpps 13.41Mpps > > pcap netmap > FromDevice->Queue->ToDevice 420Kpps 3.97 Mpps > > > Click userland performance was never a priority given the high cost > (until now) of packet I/O. But once packet i/o has become quite fast, > it turns out that there are to other big offenders: > - the C++ memory allocator is quite expensive, and replacing it with > thread-local freelists (Packet objects and data buffers can be made > all with the same size) gives huge savings -- 100ns per packet or more > even on a fast machine; > > - everytime an element wants a timestamp, it calls a syscall (gettimeofday() > or similar) which consumes another 400-800ns per call. There are many > elements (e.g. InfiniteSource, Counter, etc.) which timestamp packets. > > Attached there are a couple of patches which address these problems: > > - patch-pcap makes FromDevice and ToDevice use libpcap properly, > supporting I/O in bursts to amortize the syscall overhead. > This has been tested on FreeBSD. > > - patch-more > + introduces a NOTS option for InfiniteSource to remove timestamps. > This gives a 10x performance improvement in simple apps using > InfiniteSource > > + replaces the allocator for Packet and data buffers with local freelists; > not thread safe, but this is easy to introduce. This gives another > 1.5-2x > speed improvement after the 10x gained removing timestamps; > > + enables BURST operation in Discard, giving another 2x speed improvement > > Using netmap instead of pcap is another big win, as you can see the > forwarding > performance of a simple FromDevice->Queue->ToDevice chain goes up by 10x > You can find netmap at http://info.iet.unipi.it/~luigi/netmap/ > > cheers > luigi _______________________________________________ click mailing list click@amsterdam.lcs.mit.edu https://amsterdam.lcs.mit.edu/mailman/listinfo/click