Jon Siwek created BIT-1161:
------------------------------

             Summary: topic/jsiwek/faster-val-clone
                 Key: BIT-1161
                 URL: https://bro-tracker.atlassian.net/browse/BIT-1161
             Project: Bro Issue Tracker
          Issue Type: Improvement
          Components: Bro
    Affects Versions: git/master
            Reporter: Jon Siwek
             Fix For: 2.3


This branch makes it less expensive to serialize large/complex values (e.g. 
connection and/or fa_file records).

The obvious overhead that could be reduced was from the fixed growth 
incrementation of the buffer used to contain serialized data.  With records 
that expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509) in 
serialized form, it takes a bit too many allocations when trying to get there 
in growth increments of 64K.  It may also help some to use realloc instead of 
new/memcpy/delete each time it needs to grow.

I didn't find it helped much to increase the initial buffer size from 64K (and 
90% of the things needing serialization fit in that size buffer anyway).

It could possibly help to preallocate a buffer that gets re-used across 
serializations instead of repeatedly allocating small buffers that will need to 
be resized.

I don't have a complete breakdown/view of the bytes that make up the serialized 
version of the large/complex records, but taking a quick look I note that the 
filenames from Location information of each BroObj/Val make up a third of ~1.6M 
(master).  And that's the full path of each file, so this all will depend on 
where the Bro scripts reside on the file system (i.e. put them as close to the 
root dir as possible and you might increase performance!).

Any other quick ideas of what can be done here?  If not, improving the 
serialization seems to deserve its own project (which also might be part of the 
new comm. library project) for later.

In the meantime, it's at least shown that avoiding situations where 
large/complex records are serialized can help (BIT-1139).  And that might 
always be a useful optimization strategy if the serialized representation of 
Vals is going to scale not just as a function of their value, but also w/ their 
type/attribute/location information.



--
This message was sent by Atlassian JIRA
(v6.2-OD-10-004-WN#6253)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to