[
https://bro-tracker.atlassian.net/browse/BIT-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706#comment-15706
]
Jon Siwek commented on BIT-1143:
--------------------------------
I've got topic/jsiwek/file-signatures in bro, 3rdparty, bro-testing, and
bro-testing-private repos to a point where they might be ready to merge or at
least I'm unsure what more to do w/ it at the moment. Seth do you want this
assigned to you to first look over the new file magic signatures (maybe look
for important mime types that are somehow missing, or try improving some
regexes) ? Also open to others to take a look and make suggestions.
New file magic signatures: these are derived from the default libmagic magic
database in a semi-automatic/assisted way. I instrumented a version of the
{{file}} command, see https://github.com/jsiwek/file/tree/bro-signatures, to
get at the internal representation of the magic rules and had it emit Bro
signatures for any set of rules associated with a MIME type. The conversion
logic is not currently perfect for all combinations of magic rules and the
effort to make it perfect didn't seem worth it, so warnings are emitted upon
encountering tricky scenarios. Afterward, I did a pass over everything and
manually fixed (or just removed, depending on circumstances) the cases where it
indicated an automatic conversion might not be correct.
Signature maintenance: Going forward, Bro's file signatures can be considered
on their own and improved independently of libmagic's rules (i.e. there's no
required/extra/continual maintenance task in updating signatures, though the
libmagic database would probably still be useful for reference when someone is
trying to improve/add signatures).
Signature accuracy: Surprisingly, Bro's test suites don't detect file types
much differently using the new signatures over libmagic. The variance is
actually less than I've seen in switching between versions of libmagic. And
the differences in detected MIME types are at least somewhat reasonable -- the
most questionable differences are the text/plain detections because libmagic
has builtin logic for various text encodings/charsets, but the signature I
ended up writing to fill that gap just does ASCII for now.
Signature performance: Didn't do very robust profiling/benchmarking, but I
found slight improvements in various configurations in terms of instructions
and time running against the long m57 pcap. That at least matches expectations
of it not theoretically being able to be worse than libmagic's approach, so
didn't dig any deeper. And it also should scale better as the number of
signatures increases.
Signature unit tests: there's no new regression tests in place for the new file
magic signatures. That could take a while to make, is it required to have
immediately or can wait? And any opinion on the structure of such a test
suite? I imagine just having the test suite in the bro repo, but a corpus of
file types to test against is probably going to need some other canonical place
to live.
> Investigate replacing libmagic w/ signatures for file identificaiton
> --------------------------------------------------------------------
>
> Key: BIT-1143
> URL: https://bro-tracker.atlassian.net/browse/BIT-1143
> Project: Bro Issue Tracker
> Issue Type: New Feature
> Components: Bro
> Affects Versions: git/master
> Reporter: Jon Siwek
> Assignee: Jon Siwek
> Fix For: 2.3
>
>
> I think it makes sense to try to make the switch from libmagic to using Bro's
> own signature engine for file identification before the next release. Don't
> want people getting used to magic file format for their own custom file
> identification rules.
--
This message was sent by Atlassian JIRA
(v6.2-OD-10-004-WN#6253)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev