Good evening,

We are fencing with someone who's providing us with some clues as to their
personality and writing style. It seems possible that we'd be able to find
traces of him/her in forums, chat rooms, and IRC channels if we applied
appropriate tools and procedures. We've started down this path on our own
but are looking for any papers or books that might assist with the
process.

What we've done to date:

1) As a group, developed a rough personality profile to help guide us in
deciding where to look.
2) Set up an IRC logging tool and set it running on some likely IRC channels.
3) Set up a web mirroring tool to archive a couple of potentially
interesting web sites. Some we're just keeping a current snapshot and some
we're keeping a daily snapshot.
4) Installed a local indexing tool, running over the IRC log and mirroring
output.
5) Googled for various pieces of what appear to be his writing style.
Since this includes punctuation and spacing, we're hitting a bit of a
wall.

So, for the moment, this is just data collection. We are trying to figure
out how best to analyze this data, and other places to look.

Here are some examples of things that might be able to help. Certainly not
an exhaustive list:

1) Ability to search for a specific photograph on the 'net, or a checksum
of one. Checksums of photographs seem to be used often in child
pornography cases. httrack, which we're using to mirror sites, can produce
md5 checksums of images during the process.

2) Ability to develop a "finger print" of a particular writing style and
search for it. This sort of thing has been done to find other works by
authors, or to search for copyright violations.

3) Log and analyze IRC traffic. We've found tools to do one or the other,
but nothing that is quite right.

4) Search engines that include non-standard information in their indexes.
We've used spiders to monitor interesting web sites on our own, but this
adds additional load to the target, and leaves a trail.


Thanks very much.

-David


Reply via email to