Good evening, We are fencing with someone who's providing us with some clues as to their personality and writing style. It seems possible that we'd be able to find traces of him/her in forums, chat rooms, and IRC channels if we applied appropriate tools and procedures. We've started down this path on our own but are looking for any papers or books that might assist with the process.
What we've done to date: 1) As a group, developed a rough personality profile to help guide us in deciding where to look. 2) Set up an IRC logging tool and set it running on some likely IRC channels. 3) Set up a web mirroring tool to archive a couple of potentially interesting web sites. Some we're just keeping a current snapshot and some we're keeping a daily snapshot. 4) Installed a local indexing tool, running over the IRC log and mirroring output. 5) Googled for various pieces of what appear to be his writing style. Since this includes punctuation and spacing, we're hitting a bit of a wall. So, for the moment, this is just data collection. We are trying to figure out how best to analyze this data, and other places to look. Here are some examples of things that might be able to help. Certainly not an exhaustive list: 1) Ability to search for a specific photograph on the 'net, or a checksum of one. Checksums of photographs seem to be used often in child pornography cases. httrack, which we're using to mirror sites, can produce md5 checksums of images during the process. 2) Ability to develop a "finger print" of a particular writing style and search for it. This sort of thing has been done to find other works by authors, or to search for copyright violations. 3) Log and analyze IRC traffic. We've found tools to do one or the other, but nothing that is quite right. 4) Search engines that include non-standard information in their indexes. We've used spiders to monitor interesting web sites on our own, but this adds additional load to the target, and leaves a trail. Thanks very much. -David
