================================================= Bug #2 : When one mail goes to two archived lists =================================================
This is the problem mentioned in the FAQ. It is kind of hard and (to me) kind of interesting. We archive two mailing lists: first_nations and nativenews. Often someone will CC: a letter to both lists. Thus, the system gets two copies. Both copies (unfortunately) end up in first_nations. Why? Two cases. 1) Our first pass of sorting heuristics works roughly like this: Examine incoming message. Take all known listnames from archives, and grep each against the headers of the incoming mail. As soon as we find a match, file the mail away. Since we go alphabetically, we end up matching [EMAIL PROTECTED] for both letters that bear "To: [EMAIL PROTECTED], [EMAIL PROTECTED]" It has been suggested one aproach might be to cross check against envelope addresses or other headers. Cost: This particular sort occurs several hundred times a day, and on average required 125 greps and takes several seconds. So it's already kind of expensive; I'd rather not see it get a lot more expensive. There is some leeway, as the time bottleneck is in the MHonArc archiving runs. 2) In order to improve efficiency, if any mail has queued up in the inbox, we gracefully switch to a batch operation. Once the initial sorting and list determination is made, we also grab any other mail in the inbox for that particular list, and archive it all together. This is done via MH refile commands. So, for example, we get two identical letters addressed to "To: [EMAIL PROTECTED], [EMAIL PROTECTED]", one from each list. Lets say they arrive nearly simultaneously. The first one will get sorted to first_nations, possibly erroneously. (see above). Then, we will do a sweep of the inbox looking for other first_nations mail. The MH refile commands will grab the other message. Both will get refiled to first_nations. Cost: Cost is important in the MH refile section - this is the batch mode for when things get really busy. Any expense here will affect performance limits. One the other hand, MHonArc is is still the bottleneck, so don't feel too constrained. To solve this problem, it makes sense to read the code and understand the sorting algorthm. (Not hard to do, it's short; look at the file called "mailme") One possibility is to do checks to make sure nothing ever gets erroneously pulled into the filter. Another possibility is to look over the material that has been pulled into the filter and put things back in the inbox if they are not correct. I don't know the right solution. Severity: Dozens of pieces of mail have been misfiled (that I know about) and the problem is getting more common as the archive grows. I've appended two letters that failed in real life. Note that they originated as the same email, went to two lists, which each left their mark on both the header and body of the message. Also, it looks like the cut and paste operation may have wrapped some of the headers (just imagine them validly formatted!) ---------------------------- Return-Path: [EMAIL PROTECTED] Return-Path: <[EMAIL PROTECTED]> Received: from home.ease.lsoft.com (home.ease.lsoft.com [206.241.12.9]) by jab.org (8.8.7/8.8.7) with ESMTP id UAA22528 for <[EMAIL PROTECTED]>; Fri, 18 Dec 1998 20:09:30 -0500 Received: from home (home.ease.lsoft.com) by home.ease.lsoft.com (LSMTP for Windows NT v1.1b) with SMTP id <[EMAIL PROTECTED]>; Fri, 18 Dec 1998 20:11:27 -0500 Received: from HOME.EASE.LSOFT.COM by HOME.EASE.LSOFT.COM (LISTSERV-TCP/IP release 1.8d) with spool id 18626252 for [EMAIL PROTECTED]; Fri, 18 Dec 1998 20:10:53 -0500 Received: from pond.com (wanda.vf.pond.com) by home.ease.lsoft.com (LSMTP for Windows NT v1.1b) with SMTP id <[EMAIL PROTECTED]>; Fri, 18 Dec 1998 20:10:52 -0500 Received: from [205.160.4.248] (ascend1-31.vf.pond.com [205.160.4.237]) by pond.com (8.9.1/8.9.1) with ESMTP id UAA06719; Fri, 18 Dec 1998 20:08:57 -0500 (EST) X-Sender: [EMAIL PROTECTED] Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Message-ID: <v0313030fb2a0aaed71ed@[205.160.4.248]> Date: Fri, 18 Dec 1998 20:08:19 -0500 Reply-To: FN <[EMAIL PROTECTED]> Sender: FN <[EMAIL PROTECTED]> From: Sonja Keohane <[EMAIL PROTECTED]> Subject: [FN] Common Cause Email Alert - Soft Money Comments: To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Using this address<http://www.commoncause.org/laundromat > if you fill in the word "tribe" where is says "search by donor" there is <snip> Return-Path: [EMAIL PROTECTED] Return-Path: <[EMAIL PROTECTED]> Received: from zebra.esosoft.net (zebra.esosoft.net [207.153.253.162]) by jab.org (8.8.7/8.8.7) with ESMTP id UAA22520 for <archive@jab.org>; Fri, 18 Dec 1998 20:09:15 -0500 Received: from tiger.esosoft.net (tiger.esosoft.net [192.41.6.127]) by zebra.esosoft.net (8.9.1/8.9.1) with ESMTP id UAA06208; Fri, 18 Dec 1998 20:08:58 -0500 (EST) Received: from localhost (tiger@localhost) by tiger.esosoft.net (8.8.5) id SAA14897; Fri, 18 Dec 1998 18:08:42 -0700 (MST) Received: by tiger.esosoft.net (bulk_mailer v1.9); Fri, 18 Dec 1998 18:08:42 -0700 Received: (tiger@localhost) by tiger.esosoft.net (8.8.5) id SAA14884; Fri, 18 Dec 1998 18:08:41 -0700 (MST) Received: from pond.com (wanda.vf.pond.com [198.69.82.2]) by tiger.esosoft.net (8.8.5) id SAA14880; Fri, 18 Dec 1998 18:08:39 -0700 (MST) X-Authentication-Warning: tiger.esosoft.net: Host wanda.vf.pond.com [198.69.82.2] claimed to be pond.com Received: from [205.160.4.248] (ascend1-31.vf.pond.com [205.160.4.237]) by pond.com (8.9.1/8.9.1) with ESMTP id UAA06719; Fri, 18 Dec 1998 20:08:57 -0500 (EST) X-Sender: [EMAIL PROTECTED] Message-Id: <v0313030fb2a0aaed71ed@[205.160.4.248]> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Fri, 18 Dec 1998 20:08:19 -0500 To: [EMAIL PROTECTED], [EMAIL PROTECTED] From: Sonja Keohane <[EMAIL PROTECTED]> Subject: NATIVE_NEWS: Common Cause Email Alert - Soft Money Sender: [EMAIL PROTECTED] Reply-To: Sonja Keohane <[EMAIL PROTECTED]> And now:Sonja Keohane <[EMAIL PROTECTED]> writes: Using this address<http://www.commoncause.org/laundromat > if you fill in the word "tribe" where is says "search by donor" there is <snip>