On October 31, 2000 at 13:10, "Mathias K rber" wrote:

> My requirements:
>       a) The system should be able to filter out numerous duplicate
>          emails. I know I could use formail -D for this. Does MHonARC
>          have a native detector for duplicate emails? It would be nice
>          if it could detect even duplicates that differ in their M-ID
>          (eg remails etc)

On a per archive basis, MHonArc uses message-ids to detect duplicates.
As for the last questions of the paragraph, this requires heuristics
that can be a real pain to do, and never be perfect.

>       b) The main requirement is that the archive INDEX is accessible
>          easily, so I guess it will have to reside on my server's HD
>          somewhere.
>          The index should be able to be stored on CD-R (or RW) for
>          mobility and backup purposes.
> 
>          Minimal indexing requirements:
>               Date
>               From, To, CC, Bcc,Sender and their X-equivs (real names and add
> resses)

Receipient field information is not available on index pages.  They
will show up on message pages unless you exclude them.

>               Subject
>               Message-ID, References
>               Attachment filenames

References and attachment filenames are currently not available for
listing on index pages.  It can be done, but what the resulting
formating of these values can be a problem since since the represent a
list of values on not a single item.

>               Some form of free-text indexes for the body would be nice but 
>               I guess it would
>                       a) create a humongous amount of data
>                       b) be difficult to implement w/o also indexing
>                          too common words (be, to, etc)
>                       c) would require some powerful searchengine to
>                          provide a useful interface..

There are several search engines out there.  This is outside
of the scope of MHonArc, but many users hook in search engines for
their archives.  Examples: htdig, glimpse, namazu.

>       c) The archive (emails) itself can be stored on multiple CDs
>          (CD-R or CD-RW). If the mails could be stored in a compressed
>           format this would be OK with me too.

MHonArc supports gzip output.

>          It would be nice if the system could split the archive
>          automatically by date (eg year/month, so they can be put
>          on CD separately). Mails that were duplicated in different
>          periods might need a link ?

Requires a pre-processor.  Can probably be done with Procmail.
Perl is an options also.

>       d) Multi-system (Unix, Windows) access to the archive is a must. This
>          is why I think MHonARC might be the right tool, as HTML can be
>          read by both systems.

Could restrict choice of search engine.  I know Namazu has a win32 version,
but I do not know if the index files are compatible.  Have never checked.
A Java-based search engine is an option, but I do not know if any
are available.

>       e) The archive shold be extensible, so that I can pipe new mails into

MHonArc was designed to add messages to an existing archive.

> Is there maybe a better tool than MHonARC to do this?

As is usually the case, it will probably be a combination of tools.

--ewh

Reply via email to