I've done all this already.  But I can't say when it'll be publically
available.  I've been re-writing it in Python (along with SQL, of course)
and I'm whipping up some new analysis.

Jeroen, what format do you have the messages in right now?  As I think you
know, I've scraped them off the Yahoo archives, but if you have the complete
e-mails, with headers and such, I could do a much better job.

FYI, my old database had 10 million records, of which Brin-L was just a
little bit.  The new one now has 6 million.  No real performance issues yet,
though some things have been difficult to optimize.  Efficiently threading,
calculating each person's interval between posts and that sort of thing have
been the more challenging ones.

I'm using MySQL -- there's really no need for Oracle.  You can even use
Access as a front end to MySQL.  I've done that, and the same with Excel.
They're handy for trying out queries and examining the results list
carefully.

If I ever get that dang machine up for you use, I'll install MySQL on it and
even set it up with my Python scripts for archiving and such.  Trouble is,
we're moving in a week and I can't seem to convince the machine to read the
Linux CDs I burned for it, at least not at boot time.  Too many things to
do...

Nick

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of [EMAIL PROTECTED]
> Sent: Saturday, May 04, 2002 2:58 PM
> To: Brin-L
> Subject: RE: Brin-l Statistics Page
>
>
> Any database that uses SQL would work.  The learning curve for
> SQL isn't that
> steep either, though you might have to find a database server
> that's within
> your budget.  A decent database server can handle 2^20 records
> quickly and
> efficiently.  Then you'll just have to write a script to input and output
> from the database to HTML periodically for static HTML.  Dynamic
> server side
> scripting would be even easier, if you control the web server itself.
>
> If you want to hire me to do it I would -- I'm not working for
> the past few
> months.  E-mail me off-list.

Reply via email to