On 12/17/2010 4:55 AM, Lukáš Vlček wrote: > > I am looking at a best practice way how to integrate mailman with external > search engine. I found the following Wiki page [1] which contains a link to > Ext_Arch.py template which is brainchild of Mark Sapiro and Cedric Jeanneret > [2]. Cerdic was after indexing emails using Xapian and his implementation of > the Ext_Arch.py can be found here [3]. This all looks very promising but I > have a few questions/concerns: > > To me it seems that the PUBLIC_EXTERNAL_ARCHIVER and > PRIVATE_EXTERNAL_ARCHIVER commands (which are both set in mm_cfg.py) are > executed only when a new message arrives, that means it is not executed when > bin/arch is executed. This means that if there has been running some mail > list on mailman for a few years now and now I would like to allow searching > its content via new external search engine (like Xapian) it is simply no > enough to add external archiver and restart mailman because this would index > only newly added messages. Am I right?
Yes, you are right. The design intent of external archivers is that they provide a hook to use an external process for both archiving and searching of the external archive. External archivers were never intended to be used to index the built-in pipermail archive. Thus, the Ext_Arch.py template is just a kludge which is admittedly incomplete in this respect. > How can I then have reindexed old content from that mail list into Xapian as > well? bin/arch <maillist> does not do that as it does not execute external > archivers. Moreover, running bin/arch can change URLs of individual public > emails (re-numbering) and that is probably unacceptable. So is there any way > how to iterate over existing emails, parse them and get an existing URL > value for them? (Such information could be then used to re-index old content > into external search server without need to run bin/arch). find /path/to/archives/private/LISTNAME \ | egrep "[0-9]{6}.html" \ | sed "s;.*archives/private;http://www.example.com/pipermail;" with the obvious modification will get the URLs. Will that be enough? > > [1] > http://wiki.list.org/display/DOC/4.87+How+do+I+invoke+some+process+on+messages+as+they+are+added+to+the+pipermail+archive > [2] http://www.mail-archive.com/mailman-users@python.org/msg56679.html > [3] > https://bugs.launchpad.net/mailman/+bug/531942/+attachment/1199211/+files/archive-and-index.py -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/archive%40jab.org