[ 
https://bro-tracker.atlassian.net/browse/BIT-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706#comment-15706
 ] 

Jon Siwek commented on BIT-1143:
--------------------------------

I've got topic/jsiwek/file-signatures in bro, 3rdparty, bro-testing, and 
bro-testing-private repos to a point where they might be ready to merge or at 
least I'm unsure what more to do w/ it at the moment.  Seth do you want this 
assigned to you to first look over the new file magic signatures (maybe look 
for important mime types that are somehow missing, or try improving some 
regexes) ?  Also open to others to take a look and make suggestions.

New file magic signatures:  these are derived from the default libmagic magic 
database in a semi-automatic/assisted way.  I instrumented a version of the 
{{file}} command, see https://github.com/jsiwek/file/tree/bro-signatures, to 
get at the internal representation of the magic rules and had it emit Bro 
signatures for any set of rules associated with a MIME type.  The conversion 
logic is not currently perfect for all combinations of magic rules and the 
effort to make it perfect didn't seem worth it, so warnings are emitted upon 
encountering tricky scenarios.  Afterward, I did a pass over everything and 
manually fixed (or just removed, depending on circumstances) the cases where it 
indicated an automatic conversion might not be correct.

Signature maintenance:  Going forward, Bro's file signatures can be considered 
on their own and improved independently of libmagic's rules (i.e. there's no 
required/extra/continual maintenance task in updating signatures, though the 
libmagic database would probably still be useful for reference when someone is 
trying to improve/add signatures).

Signature accuracy: Surprisingly, Bro's test suites don't detect file types 
much differently using the new signatures over libmagic.  The variance is 
actually less than I've seen in switching between versions of libmagic.  And 
the differences in detected MIME types are at least somewhat reasonable -- the 
most questionable differences are the text/plain detections because libmagic 
has builtin logic for various text encodings/charsets, but the signature I 
ended up writing to fill that gap just does ASCII for now.

Signature performance: Didn't do very robust profiling/benchmarking, but I 
found slight improvements in various configurations in terms of instructions 
and time running against the long m57 pcap.  That at least matches expectations 
of it not theoretically being able to be worse than libmagic's approach, so 
didn't dig any deeper.  And it also should scale better as the number of 
signatures increases.

Signature unit tests: there's no new regression tests in place for the new file 
magic signatures.  That could take a while to make, is it required to have 
immediately or can wait?  And any opinion on the structure of such a test 
suite?  I imagine just having the test suite in the bro repo, but a corpus of 
file types to test against is probably going to need some other canonical place 
to live.

> Investigate replacing libmagic w/ signatures for file identificaiton
> --------------------------------------------------------------------
>
>                 Key: BIT-1143
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1143
>             Project: Bro Issue Tracker
>          Issue Type: New Feature
>          Components: Bro
>    Affects Versions: git/master
>            Reporter: Jon Siwek
>            Assignee: Jon Siwek
>             Fix For: 2.3
>
>
> I think it makes sense to try to make the switch from libmagic to using Bro's 
> own signature engine for file identification before the next release.  Don't 
> want people getting used to magic file format for their own custom file 
> identification rules.



--
This message was sent by Atlassian JIRA
(v6.2-OD-10-004-WN#6253)
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to