=================================================
Bug #2 : When one mail goes to two archived lists
=================================================

This is the problem mentioned in the FAQ. It is kind of hard and (to
me) kind of interesting. We archive two mailing lists: first_nations
and nativenews. Often someone will CC: a letter to both lists. Thus,
the system gets two copies. Both copies (unfortunately) end up in
first_nations.

Why? Two cases.

1) Our first pass of sorting heuristics works roughly like this:
   Examine incoming message. Take all known listnames from archives,
   and grep each against the headers of the incoming mail. As soon as
   we find a match, file the mail away. Since we go alphabetically, we
   end up matching [EMAIL PROTECTED] for both letters that bear
   "To: [EMAIL PROTECTED], [EMAIL PROTECTED]"

   It has been suggested one aproach might be to cross check against
   envelope addresses or other headers.

   Cost: This particular sort occurs several hundred times a day, and
   on average required 125 greps and takes several seconds. So it's
   already kind of expensive; I'd rather not see it get a lot more
   expensive. There is some leeway, as the time bottleneck is in
   the MHonArc archiving runs.

2) In order to improve efficiency, if any mail has queued up in the
   inbox, we gracefully switch to a batch operation. Once the initial
   sorting and list determination is made, we also grab any other mail
   in the inbox for that particular list, and archive it all
   together. This is done via MH refile commands.

   So, for example, we get two identical letters addressed to "To:
   [EMAIL PROTECTED], [EMAIL PROTECTED]", one from each
   list. Lets say they arrive nearly simultaneously. The first one
   will get sorted to first_nations, possibly erroneously. (see
   above).

   Then, we will do a sweep of the inbox looking for other
   first_nations mail. The MH refile commands will grab the other
   message. Both will get refiled to first_nations.

   Cost: Cost is important in the MH refile section - this is the
   batch mode for when things get really busy. Any expense here will
   affect performance limits. One the other hand, MHonArc is
   is still the bottleneck, so don't feel too constrained.

To solve this problem, it makes sense to read the code and understand
the sorting algorthm. (Not hard to do, it's short; look at the file
called "mailme") One possibility is to do checks to make sure nothing
ever gets erroneously pulled into the filter. Another possibility is
to look over the material that has been pulled into the filter and put
things back in the inbox if they are not correct. I don't know the
right solution.

Severity: Dozens of pieces of mail have been misfiled (that I know
about) and the problem is getting more common as the archive grows.

I've appended two letters that failed in real life. Note that they
originated as the same email, went to two lists, which each left their
mark on both the header and body of the message. Also, it looks like
the cut and paste operation may have wrapped some of the headers
(just imagine them validly formatted!)

----------------------------

Return-Path: [EMAIL PROTECTED]
Return-Path: <[EMAIL PROTECTED]>
Received: from home.ease.lsoft.com (home.ease.lsoft.com [206.241.12.9])
        by jab.org (8.8.7/8.8.7) with ESMTP id UAA22528
        for <[EMAIL PROTECTED]>; Fri, 18 Dec 1998 20:09:30 -0500
Received: from home (home.ease.lsoft.com) by home.ease.lsoft.com
(LSMTP for Windows NT v1.1b) with SMTP id
<[EMAIL PROTECTED]>; Fri, 18 Dec 1998 20:11:27 -0500
Received: from HOME.EASE.LSOFT.COM by HOME.EASE.LSOFT.COM (LISTSERV-TCP/IP
          release 1.8d) with spool id 18626252 for
          [EMAIL PROTECTED]; Fri, 18 Dec 1998 20:10:53
          -0500
Received: from pond.com (wanda.vf.pond.com) by home.ease.lsoft.com (LSMTP for
          Windows NT v1.1b) with SMTP id
          <[EMAIL PROTECTED]>;
          Fri, 18 Dec 1998 20:10:52 -0500
Received: from [205.160.4.248] (ascend1-31.vf.pond.com [205.160.4.237]) by
          pond.com (8.9.1/8.9.1) with ESMTP id UAA06719; Fri, 18 Dec
          1998
          20:08:57 -0500 (EST)
X-Sender: [EMAIL PROTECTED]
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Message-ID:  <v0313030fb2a0aaed71ed@[205.160.4.248]>
Date:         Fri, 18 Dec 1998 20:08:19 -0500
Reply-To: FN <[EMAIL PROTECTED]>
Sender: FN <[EMAIL PROTECTED]>
From: Sonja Keohane <[EMAIL PROTECTED]>
Subject:      [FN] Common Cause Email Alert - Soft Money
Comments: To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]

        Using this address<http://www.commoncause.org/laundromat > if
you fill in the word "tribe" where is says "search by donor" there is
<snip>



Return-Path: [EMAIL PROTECTED]
Return-Path: <[EMAIL PROTECTED]>
Received: from zebra.esosoft.net (zebra.esosoft.net [207.153.253.162])
        by jab.org (8.8.7/8.8.7) with ESMTP id UAA22520
        for <archive@jab.org>; Fri, 18 Dec 1998 20:09:15 -0500
Received: from tiger.esosoft.net (tiger.esosoft.net [192.41.6.127])
        by zebra.esosoft.net (8.9.1/8.9.1) with ESMTP id UAA06208;
        Fri, 18 Dec 1998 20:08:58 -0500 (EST)
Received: from localhost (tiger@localhost) by tiger.esosoft.net
(8.8.5) id SAA14897; Fri, 18 Dec 1998 18:08:42 -0700 (MST)
Received: by tiger.esosoft.net (bulk_mailer v1.9); Fri, 18 Dec 1998
18:08:42 -0700
Received: (tiger@localhost) by tiger.esosoft.net (8.8.5) id SAA14884;
Fri, 18 Dec 1998 18:08:41 -0700 (MST)
Received: from pond.com (wanda.vf.pond.com [198.69.82.2]) by
tiger.esosoft.net (8.8.5) id SAA14880; Fri, 18 Dec 1998 18:08:39 -0700
(MST)
X-Authentication-Warning: tiger.esosoft.net: Host wanda.vf.pond.com
[198.69.82.2] claimed to be pond.com
Received: from [205.160.4.248] (ascend1-31.vf.pond.com
[205.160.4.237])
        by pond.com (8.9.1/8.9.1) with ESMTP id UAA06719;
        Fri, 18 Dec 1998 20:08:57 -0500 (EST)
X-Sender: [EMAIL PROTECTED]
Message-Id: <v0313030fb2a0aaed71ed@[205.160.4.248]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Fri, 18 Dec 1998 20:08:19 -0500
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
From: Sonja Keohane <[EMAIL PROTECTED]>
Subject: NATIVE_NEWS: Common Cause Email Alert - Soft Money
Sender: [EMAIL PROTECTED]
Reply-To: Sonja Keohane <[EMAIL PROTECTED]>

And now:Sonja Keohane <[EMAIL PROTECTED]> writes:

        Using this address<http://www.commoncause.org/laundromat > if
you fill in the word "tribe" where is says "search by donor" there is
<snip>

Reply via email to