Hi everybody,

I'm pleased to announce that I have published both a paper and an implementation for our fuzzy hashing. You may have heard me talk about this on the Cyberspeak podcast[1], and now it's out!

The program, ssdeep, works like md5deep to create a short text signature for each input file. The signatures can be used to match other files against the original. Unlike MD5 or SHA-1, however, this algorithm can match two input files even if they are not exactly the same. Files match if they have significant homologies, or the same sequences of bytes in the same order. For example, if file2 is the same as file1 but with an extra 'A' appended to the end, they match. If file2 is just the first 33% of file1, they match. If file2 is just the last 33% of the file1, they match. Lots of little changes between file1 and file2 won't match, however. Fuzzy hashing is not perfect. But it is pretty cool!

You'll find the program at http://ssdeep.sourceforge.net/ and the full academic paper at http://dfrws.org/2006/proceedings/12- Kornblum.pdf.

Let me know if you have any questions!

[1] The Cyberspeak podcast can be found at http:// cyberspeak.libsyn.com/index.php?post_id=115142

--
Jesse


Reply via email to