Earl is correct. MHonArc computation is not the bottleneck, disk 
space is. Latency is still under 24 hours if things run at full tilt,
but that doesn't happen when your disk I/O system is spending
ages searching for the few free inodes amongst the billions already
used.

We do use a lot of CPU -- split evenly between MHonArc and HtDig
indexing. Disk space is also split evenly between HTML and MHonArc 
index files (raw mail has long since been offloaded from the system).

I've considered data compression on unchanging archives (i.e.
mod_gzip on the HTML pages) but that just trades disk for CPU 
resources, and complicates the software. Can do it on active
archives or htdig indexing becomes expensive.

> Should there be message expiration [?]

Yeah, I think we've hit the runaway success point where it is
helpful. I think I am going to limit maximum archive size to a 
few thousand messages after the new hardware is in place and there 
is some breathing room.

> What mail-archive.com is experiencing now is the resource limitations
> that a single individual can provide.  It appears mail-archive has
> grown bigger than what Jeff ever thought it would.  Since there are
> several open source projects that utilize the service, it would be
> nice that some contribution, like in resources, were provided to
> mail-archive to avoid problems like the current situation.

That's fundamentally the issue. I'm at the very edge of what I can
provide as a hobbyist considering the next generation hardware
that makes sense stores 1 to 2 TB (at roughly $6K/TB) and I still have
no good place to put such a machine on the net.

-Jeff



_______________________________________________
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to