On Mon, Jul 9, 2007, Paul J Stevens <[EMAIL PROTECTED]> said: > Aaron Stone wrote: >> I've been thinking about this for a while, and especially with the query >> time escalation code, have found that the queries which do full header >> searches are incredibly slow (for the obvious reason that they cannot use >> the index). >> >> I have two thoughts here. First, our imap search algorithm is based on >> merging multiple independent queries. We should do some planning so that a >> search that includes a free-text headervalue query does all of the other >> queries first, finds the possible result set, then chunks out the >> headervalue scan. >> >> For example, I often do a search limited by date: >> >> SEARCH SINCE SINCE 1-Feb-2007 FROM dbmail >> >> This currently breaks into two queries that search all messages, the >> latter of which has a "WHERE headervalue LIKE '%dbmail%'", and takes about >> 25 seconds and boatloads of CPU when running against my Inbox of 30,000 >> messages. Restricting it to the most recent few thousand would make it >> much much faster. > > I gave a lot of thought to this subject when I did the search > implementation as it currently stands. But I decided early on to focus > on getting correct results and defer performance to a next round. Maybe > I was wrong there, but it was the only way I could get my head around it > at the time. > > In the (broken) search setup as used in 2.0 I used a pattern where I > assigned weight to each search type during the command parsing stage and > perform the actual searches in an ordered fashion. That worked *really* > well back then. > > However, with nested search graphs this requires some extra care, but as > long as searches are on the same level of nesting (like the example > above) you can indeed use a weighted ordering of searches. > > The main gotcha lies in the fact that for now the searching and merging > is strictly separated. To use weighted planning you'd have to > search/merge each level in the tree separately. Search-n-merge while > traversing the search graph so to speak.
Cool, thanks for the history of your thinking on this problem! I wonder if anything besides full text header value searches is expensive now that we have the header values broken out and indexed? Date ranges, flags, and so on all seem to be rather cheap. So perhaps we can go to just two stages: do all cheap stuff, generate a candidate list, then do all the expensive stuff. Require two iterations of search-merge and some kind of placeholder marker. Aaron _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev