Fuzzy Hashing

Jesse Kornblum Thu, 24 Aug 2006 20:07:34 -0700

Hi everybody,

I'm pleased to announce that I have published both a paper and animplementation for our fuzzy hashing. You may have heard me talkabout this on the Cyberspeak podcast[1], and now it's out!

The program, ssdeep, works like md5deep to create a short textsignature for each input file. The signatures can be used to matchother files against the original. Unlike MD5 or SHA-1, however, thisalgorithm can match two input files even if they are not exactly thesame. Files match if they have significant homologies, or the samesequences of bytes in the same order. For example, if file2 is thesame as file1 but with an extra 'A' appended to the end, they match.If file2 is just the first 33% of file1, they match. If file2 is justthe last 33% of the file1, they match. Lots of little changes betweenfile1 and file2 won't match, however. Fuzzy hashing is not perfect.But it is pretty cool!

You'll find the program at http://ssdeep.sourceforge.net/ and thefull academic paper at http://dfrws.org/2006/proceedings/12-Kornblum.pdf.


Let me know if you have any questions!

[1] The Cyberspeak podcast can be found at http://cyberspeak.libsyn.com/index.php?post_id=115142


--
Jesse

Fuzzy Hashing

Reply via email to