Re: [Bro-Dev] Performance Enhancements

Jim Mellander Fri, 06 Oct 2017 17:01:14 -0700

Interesting info.  The > order of magnitude difference in time between
BaseList::remove & BaseList::removenth suggests the possibility that the
for loop in BaseList::remove is falling off the end in many cases (i.e.
attempting to remove an item that doesn't exist).  Maybe thats whats broken.




On Fri, Oct 6, 2017 at 3:49 PM, Azoff, Justin S <jaz...@illinois.edu> wrote:

>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellan...@lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> >
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only).  Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable.  Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> - this website also has an x64 allocator that is claimed to be faster than
> tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note:
> I haven't tried anything from this source, but find it interesting).
> >
> > I'm guessing there are a number of such "small" optimizations that could
> provide significant performance gains.
> >
> > Take care,
> >
> > Jim
>
> I've been messing around with 'perf top', the one's complement function
> often shows up fairly high up.. that, PriorityQueue::BubbleDown, and
> BaseList::remove
>
> Something (on our configuration?) is doing a lot of
> PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class
> before in bro.. I think a script may be triggering something that is
> hurting performance.  I can't think of what it would be though.
>
> Running perf top on a random worker right now with -F 19999 shows:
>
> Samples: 485K of event 'cycles', Event count (approx.): 26046568975
> Overhead  Shared Object                 Symbol
>   34.64%  bro                           [.] BaseList::remove
>    3.32%  libtcmalloc.so.4.2.6          [.] operator delete
>    3.25%  bro                           [.] PriorityQueue::BubbleDown
>    2.31%  bro                           [.] BaseList::remove_nth
>    2.05%  libtcmalloc.so.4.2.6          [.] operator new
>    1.90%  bro                           [.] Attributes::FindAttr
>    1.41%  bro                           [.] Dictionary::NextEntry
>    1.27%  libc-2.17.so                  [.] __memcpy_ssse3_back
>    0.97%  bro                           [.] StmtList::Exec
>    0.87%  bro                           [.] Dictionary::Lookup
>    0.85%  bro                           [.] NameExpr::Eval
>    0.84%  bro                           [.] BroFunc::Call
>    0.80%  libtcmalloc.so.4.2.6          [.] tc_free
>    0.77%  libtcmalloc.so.4.2.6          [.] operator delete[]
>    0.70%  bro                           [.] ones_complement_checksum
>    0.60%  libtcmalloc.so.4.2.6          [.] tcmalloc::ThreadCache::
> ReleaseToCentralCache
>    0.60%  bro                           [.] RecordVal::RecordVal
>    0.53%  bro                           [.] UnaryExpr::Eval
>    0.51%  bro                           [.] ExprStmt::Exec
>    0.51%  bro                           [.] iosource::Manager::FindSoonest
>    0.50%  libtcmalloc.so.4.2.6          [.] operator new[]
>
>
> Which sums up to 59.2%
>
> BaseList::remove/BaseList::remove_nth seems particularly easy to
> optimize. Can't that loop be replaced by a memmove?
> I think something may be broken if it's being called that much though.
>
>
>
> —
> Justin Azoff
>
>

_______________________________________________
bro-dev mailing list
bro-dev@bro.org
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Re: [Bro-Dev] Performance Enhancements

Reply via email to