Aaron Stone wrote:
> >> > But I also have a simpler idea that will still speed things up. It
> >> > involves using a regexp SQL fulltext search. While it's definitely slow
> >> > compared to a changed storage (the db won't be able to use indexes),
> >> > it's still faster because it does not involve any parsing, and uses the
> >> > well-tuned SQL server code to do fulltext searching. 
> > 
> > That will still be awfully slow because the database has to read every
> > single page, lot of expensive I/O.
> 
> Not any slower than requesting, parsing, and searching every message like
> we are now! I think it's an excellent idea: first, REGEXP for messages
> that might match, then parse those to see if they match in the right place
> (specific header, body, etc.).

I planned this for 2.1, which has the is_header field. And if we could
only search headers, we would do great with regexp alone - for a regexp
can include something like "\nFrom:" . Besides, the presence of
is_header would make a regexp header search much faster (still not
nearly as fast as the two- or three-table solution, of course, but at
least we won't have to search through the bodies).

We could introduce is_header usage even in 2.0.x, because the field is
present in 2.0. But is a database upgrade script acceptable, even if it
only fills in an already-present field?
 
Yours, Mikhail Ramendik


Reply via email to