> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of J. van Baardwijk

[snip]

> How do you make such an analysis? The first thing that comes to mind is
> going through all the messages one by one, and creating a table of who
> replied to whom. But with very active communities, certainly when those
> communities also have a large number of members, that method would take
> forever and a day.

True.  It's called link analysis and it is often used in law enforcement,
but with a much, much smaller community (of criminals, usually).

I actually worked on it a bit this morning and I have a program running
right now to do pretty much what you describe.  However, all I'm looking at
is the beginning of each thread -- who started it and who the first five
responders are.  And I've limited it to messages posted since 1/1/2001.
It's only doing a bit better than one thread a second.  There are 1,902
threads (this includes one-message "threads") since that date.  Should take
a half hour or so.  I recently upgraded the database to live by itself on a
SCSI-III (160 MB/sec, 5 ms access) disk, so where it counts, this system is
very fast for a desktop machine.

> (And once you are finished and proudly present the results, you find out
> that nobody cares...)

Doesn't matter to me.  *I* care.

> The second thing that comes to mind is that there must be some piece of
> software available, but that would have to be quite intelligent (and
> therefore expensive) software if it has to interpret posts all by itself.

Yep.  There are link analysis tools, but you have to prep the data first.
I've never used the commercial ones.  They are, indeed, quite expensive.

I keep thinking there's a way to exclude the one-message threads easily, but
it is eluding me.  I guess it would be a join between my threads table and a
table generated with a query that gets distinct subjects.  That probably
isn't clear unless you speak SQL...

Nick

Reply via email to