The value of a set of archives is often unrelated to its age, or even
whether the particular mailing list is still in existence.  For example,
the EDI-L mailing list moved from a listserve mailing list hosted at the
Univ. of California to Yahoo for various reasons.  The Yahoo portion
obviously can't be archived here, but the older messages at are still
referred to and treasured, especially those things I write about re:
international standards used in e-commerce.

William J. Kammerer
Novannet, LLC.

----- Original Message -----
From: "Jeff Breidenbach" <[EMAIL PROTECTED]>
Sent: Sunday, 23 June, 2002 02:56 PM
Subject: Re: [Gossip] More Missing messages OR ....Does everything have
tobe done at once?

> As for technical solutions:
> * Archives that have not received a new message over a certain period
>   of time could be targeted for deletion.  I am sure there are
>   that are no longer used, so they could be removed.  The key is to
>   determine what is the proper period.

I'm already de-indexing them from the list of lists after
six months of inactivity. Deletion of defunct lists is probably
quite reasonable.

> * Related to the previous one is to delete archives that have not
>   been accessed over a certain period of time.  Let usage determine
>   should and should not stay.  Robot hits should be excluded.  Some
>   heuristics may need to be employed since some robots do not play
>   nice (like address harvesters).

Very hard.

  * Robot identification is hard
  * Robot traffic is high
  * Apache logs are enormously large (so I don't keep a long
  * I've turned off the "atime" records in the filesystem
    for improved performance.

> * Remove archives that are just duplicates of a lists "official"
>   archives (this would actually affect me :-)  For example,
>   I see that that are several lists archived at
>, but keeps there own set of archives
>   at <>.

Hard to identify. But banning all of YahooGroups was one step
in this direction.

> What is the space limitation of your current hosting provider?

Current hosting provider is donating co-location service, but I can't
swap out to a biggger machine. I have about 1.5 terabits there at the


Gossip mailing list

Gossip mailing list

Reply via email to