> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellan...@lbl.gov> wrote: > > I particularly like the idea of an allocation pool that per-packet > information can be stored, and reused by the next packet. > > There also are probably some optimizations of frequent operations now that > we're in a 64-bit world that could prove useful - the one's complement > checksum calculation in net_util.cc is one that comes to mind, especially > since it works effectively a byte at a time (and works with even byte counts > only). Seeing as this is done per-packet on all tcp payload, optimizing this > seems reasonable. Here's a discussion of do the checksum calc in 64-bit > arithmetic: https://locklessinc.com/articles/tcp_checksum/ - this website > also has an x64 allocator that is claimed to be faster than tcmalloc, see: > https://locklessinc.com/benchmarks_allocator.shtml (note: I haven't tried > anything from this source, but find it interesting). > > I'm guessing there are a number of such "small" optimizations that could > provide significant performance gains. > > Take care, > > Jim
I've been messing around with 'perf top', the one's complement function often shows up fairly high up.. that, PriorityQueue::BubbleDown, and BaseList::remove Something (on our configuration?) is doing a lot of PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class before in bro.. I think a script may be triggering something that is hurting performance. I can't think of what it would be though. Running perf top on a random worker right now with -F 19999 shows: Samples: 485K of event 'cycles', Event count (approx.): 26046568975 Overhead Shared Object Symbol 34.64% bro [.] BaseList::remove 3.32% libtcmalloc.so.4.2.6 [.] operator delete 3.25% bro [.] PriorityQueue::BubbleDown 2.31% bro [.] BaseList::remove_nth 2.05% libtcmalloc.so.4.2.6 [.] operator new 1.90% bro [.] Attributes::FindAttr 1.41% bro [.] Dictionary::NextEntry 1.27% libc-2.17.so [.] __memcpy_ssse3_back 0.97% bro [.] StmtList::Exec 0.87% bro [.] Dictionary::Lookup 0.85% bro [.] NameExpr::Eval 0.84% bro [.] BroFunc::Call 0.80% libtcmalloc.so.4.2.6 [.] tc_free 0.77% libtcmalloc.so.4.2.6 [.] operator delete[] 0.70% bro [.] ones_complement_checksum 0.60% libtcmalloc.so.4.2.6 [.] tcmalloc::ThreadCache::ReleaseToCentralCache 0.60% bro [.] RecordVal::RecordVal 0.53% bro [.] UnaryExpr::Eval 0.51% bro [.] ExprStmt::Exec 0.51% bro [.] iosource::Manager::FindSoonest 0.50% libtcmalloc.so.4.2.6 [.] operator new[] Which sums up to 59.2% BaseList::remove/BaseList::remove_nth seems particularly easy to optimize. Can't that loop be replaced by a memmove? I think something may be broken if it's being called that much though. — Justin Azoff _______________________________________________ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev