On 05/14/2013 10:17 AM, Jed Brown wrote: > I would like to be able to search the archives of a mailman list using > the Message-ID, ideally using a stable URL like > > http://mid.gmane.org/${message_id} > http://mail-archive.com/search?l=mid&q=${message_id} > > but preferably on our own host as we're not currently mirrored and would > rather link to our own archives when referencing on old discussion on > the list. Our current archives (e.g., [1]) are searched using htdig, > but it doesn't seem to support query by Message-ID. Your wiki page [2] > also suggests Swish, MnoGoSearch, and Namazu. Can any of these search > by Message-ID, or is our best bet to get indexed by mail-archive.com and > direct people there?
The Message-ID of the post is in the HTML page containing the post, but it is only in an In-Reply-To= fragment of a mailto: URL that isn't indexed in htdig. Also, it's URL encoded so <, > and @ are %3C, %3E and %40 respectively. The actual Message-ID: headers are in the periodic *.txt files. This leads to a few possibilities such as teaching htdig to index the .txt files (may be tricky, I just spent a couple of minutes looking at this and didn't see it), changing the noindex start and end tags in the list's archives/private/LIST/htdig/LIST.conf file so that everything in the HTML files including the URL encoded Message-ID is indexed or writing a separate CGI search script to search the .txt files for the Message-ID. Or, use mail-archive.com which is probably simplest. > Second question: Why are direct recipients dropped from the Cc header of > the copy sent via the list? This seems partially addressed in the > archives [3], but I think it's important for high-volume lists when > people filter conversations based on whether they are a direct > recipient. Is there an option somewhere to keep Cc headers intact > without changing other behavior? > > [1] http://lists.mcs.anl.gov/pipermail/petsc-dev/ > [2] http://wiki.list.org/display/DOC/How+do+I+make+the+archives+searchable > [3] http://mail.python.org/pipermail/mailman-developers/2006-May/018777.html I've learned a lot in the last 7 years ;) The reason is to keep the Cc: list from growing excessively long in long threads involving many people (see the subsequent post(s) in that thread). -- Mark Sapiro <[email protected]> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list [email protected] http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org
