On Wednesday 15 February 2006 05:21, David Kovar wrote: > > 2) Ability to develop a "finger print" of a particular writing style and > search for it. This sort of thing has been done to find other works by > authors, or to search for copyright violations.
David, In his presentation at What The Hack[1], Rudi Cilibrasi[2] described techniques that could be used to group things (music, animals, literature) using clustering based on compression. In his paper, [3], he gives some examples where Russian literature was grouped - by the original author (when in Russian), but also by the translator when the english translations were tested. You might want to take a look at his CompLearn software[4] - it would probably make a good starting point if you're looking to develop your own tool to look at irc/chat-rooms. Cheers, Steve. [1] http://program.whatthehack.org/event/101.de.html [2] http://cilibrar.com/ [3] http://www.cwi.nl/~paulv/papers/cluster.pdf [4] http://www.complearn.org/ -- -------------------------------------------------------------- Steve Wilson Senior Security Consultant QinetiQ, St Andrews Road Malvern, WR14 3PS Tel: (01684 89) 4153 Fax: (01684 89) 7417 --------------------------------------------------------------- 'The views expressed herein are entirely those of the writer and do not represent the views, policy or understanding of any other person or official body.' --------------------------------------------------------------- 'The information contained in this e-mail and any subsequent correspondence is private and is intended solely for the intended recipient(s). For those other than the intended recipient any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on such information is prohibited and may be unlawful.' ---------------------------------------------------------------
pgpz18Xwv1Q3V.pgp
Description: PGP signature
