> As for technical solutions:
> 
> * Archives that have not received a new message over a certain period
>   of time could be targeted for deletion.  I am sure there are archives
>   that are no longer used, so they could be removed.  The key is to
>   determine what is the proper period.

I'm already de-indexing them from the list of lists after
six months of inactivity. Deletion of defunct lists is probably
quite reasonable.

> * Related to the previous one is to delete archives that have not
>   been accessed over a certain period of time.  Let usage determine what
>   should and should not stay.  Robot hits should be excluded.  Some
>   heuristics may need to be employed since some robots do not play
>   nice (like address harvesters).

Very hard. 

  * Robot identification is hard
  * Robot traffic is high
  * Apache logs are enormously large (so I don't keep a long
    history)
  * I've turned off the "atime" records in the filesystem
    for improved performance.
 
> * Remove archives that are just duplicates of a lists "official"
>   archives (this would actually affect me :-)  For example,
>   I see that that are several cygwin.com lists archived at
>   mail-archive.com, but cygwin.com keeps there own set of archives
>   at <http://cygwin.com/lists.html>.

Hard to identify. But banning all of YahooGroups was one step
in this direction.

> What is the space limitation of your current hosting provider?

Current hosting provider is donating co-location service, but I can't
swap out to a biggger machine. I have about 1.5 terabits there at the
moment.

-Jeff




_______________________________________________
Gossip mailing list
[EMAIL PROTECTED]
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to