[
https://bro-tracker.atlassian.net/browse/BIT-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15808#comment-15808
]
Jon Siwek commented on BIT-1161:
--------------------------------
{quote}
The obvious overhead that could be reduced was from the fixed growth
incrementation of the buffer used to contain serialized data. With records that
expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509) in
serialized form, it takes a bit too many allocations when trying to get there
in growth increments of 64K. It may also help some to use realloc instead of
new/memcpy/delete each time it needs to grow.
{quote}
Note that the benefit of this optimization is more pronounced on Bernhard's
branch. And I don't think doubling the size of the serialized data there is
necessarily something wrong or needs to be fixed/changed. But it might be
something to double-check whether some of the redefs of SSL::Info or
Files::Info can be streamlined.
> topic/jsiwek/faster-val-clone
> -----------------------------
>
> Key: BIT-1161
> URL: https://bro-tracker.atlassian.net/browse/BIT-1161
> Project: Bro Issue Tracker
> Issue Type: Improvement
> Components: Bro
> Affects Versions: git/master
> Reporter: Jon Siwek
> Fix For: 2.3
>
>
> This branch makes it less expensive to serialize large/complex values (e.g.
> connection and/or fa_file records).
> The obvious overhead that could be reduced was from the fixed growth
> incrementation of the buffer used to contain serialized data. With records
> that expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509)
> in serialized form, it takes a bit too many allocations when trying to get
> there in growth increments of 64K. It may also help some to use realloc
> instead of new/memcpy/delete each time it needs to grow.
> I didn't find it helped much to increase the initial buffer size from 64K
> (and 90% of the things needing serialization fit in that size buffer anyway).
> It could possibly help to preallocate a buffer that gets re-used across
> serializations instead of repeatedly allocating small buffers that will need
> to be resized.
> I don't have a complete breakdown/view of the bytes that make up the
> serialized version of the large/complex records, but taking a quick look I
> note that the filenames from Location information of each BroObj/Val make up
> a third of ~1.6M (master). And that's the full path of each file, so this
> all will depend on where the Bro scripts reside on the file system (i.e. put
> them as close to the root dir as possible and you might increase
> performance!).
> Any other quick ideas of what can be done here? If not, improving the
> serialization seems to deserve its own project (which also might be part of
> the new comm. library project) for later.
> In the meantime, it's at least shown that avoiding situations where
> large/complex records are serialized can help (BIT-1139). And that might
> always be a useful optimization strategy if the serialized representation of
> Vals is going to scale not just as a function of their value, but also w/
> their type/attribute/location information.
--
This message was sent by Atlassian JIRA
(v6.2-OD-10-004-WN#6253)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev