On Sat, 23 May 2009, Sebastian Hagedorn wrote:

I was wondering if it was possible to use a regularly-computed word index
to speed up searching, as per Web search engines.

FWIW, Cyrus IMAP lets you do that as an option. It's called "squatting".

Interesting.

When I test searching via IMAP on MIX folders, it's actually pretty fast in realtime - fast enough for general use. But IMAP AFAIK only lets you search one folder at a time (OK, the protocol will probably let a multithreaded client run several searches in parallel). If you have lots of folders (some of our users have over 500) then that becomes less practical.

Per my last post, I've been playing with htdig and tinymail (my webmail client originally designed for pre-iphone cellphones). This works tolerably well, amazingly enough. One caveat - a spider run against a regular webmail service will clear the unread flags on all messages. You need to use BODY.PEEK and/or EXAMINE not SELECT. (I ran into a bug in older imapd where EXAMINE wasn't blocking flag modification in MIX folders; seems OK in 2007).

When I say "tolerably well", the search part is OK - I can search multiple mailboxes and get more weight for sender's names and subject lines by tuning the HTML represenation or search parameters. But indexing is very slow - for one thing, Web robots aren't supposed to hammer servers into the ground so it wasn't a design priority. Also, htdig understands about GET if-modified-since, which I can translate to FETCH $msg BODY.PEEK[HEADER.FIELDS (Date)]"), but it doesn't understand that I might not want to bother checking an old page at all if I indexed it last night, but please index new ones.

Might improve if run on localhost, might help to use persistant HTTP connections and keep the IMAP session open. Right now I'm running it across 100Bt on 1Gb (10%) of my personal mail, and I'll see how many days that takes ... :-7 (in principle, htdig can search attachments if they can be transmogrified into HTML. E.g. PDF, PostScript, Word etc., but that bit needs work. I broke the attachment parser in tinymail, and I don't have the external converters set up)

--
Andrew Daviel, TRIUMF, Canada
_______________________________________________
Imap-uw mailing list
[email protected]
http://mailman2.u.washington.edu/mailman/listinfo/imap-uw

Reply via email to