Jon Siwek created BIT-1161:
------------------------------
Summary: topic/jsiwek/faster-val-clone
Key: BIT-1161
URL: https://bro-tracker.atlassian.net/browse/BIT-1161
Project: Bro Issue Tracker
Issue Type: Improvement
Components: Bro
Affects Versions: git/master
Reporter: Jon Siwek
Fix For: 2.3
This branch makes it less expensive to serialize large/complex values (e.g.
connection and/or fa_file records).
The obvious overhead that could be reduced was from the fixed growth
incrementation of the buffer used to contain serialized data. With records
that expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509) in
serialized form, it takes a bit too many allocations when trying to get there
in growth increments of 64K. It may also help some to use realloc instead of
new/memcpy/delete each time it needs to grow.
I didn't find it helped much to increase the initial buffer size from 64K (and
90% of the things needing serialization fit in that size buffer anyway).
It could possibly help to preallocate a buffer that gets re-used across
serializations instead of repeatedly allocating small buffers that will need to
be resized.
I don't have a complete breakdown/view of the bytes that make up the serialized
version of the large/complex records, but taking a quick look I note that the
filenames from Location information of each BroObj/Val make up a third of ~1.6M
(master). And that's the full path of each file, so this all will depend on
where the Bro scripts reside on the file system (i.e. put them as close to the
root dir as possible and you might increase performance!).
Any other quick ideas of what can be done here? If not, improving the
serialization seems to deserve its own project (which also might be part of the
new comm. library project) for later.
In the meantime, it's at least shown that avoiding situations where
large/complex records are serialized can help (BIT-1139). And that might
always be a useful optimization strategy if the serialized representation of
Vals is going to scale not just as a function of their value, but also w/ their
type/attribute/location information.
--
This message was sent by Atlassian JIRA
(v6.2-OD-10-004-WN#6253)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev