Aaron Stone wrote: > >> > But I also have a simpler idea that will still speed things up. It > >> > involves using a regexp SQL fulltext search. While it's definitely slow > >> > compared to a changed storage (the db won't be able to use indexes), > >> > it's still faster because it does not involve any parsing, and uses the > >> > well-tuned SQL server code to do fulltext searching. > > > > That will still be awfully slow because the database has to read every > > single page, lot of expensive I/O. > > Not any slower than requesting, parsing, and searching every message like > we are now! I think it's an excellent idea: first, REGEXP for messages > that might match, then parse those to see if they match in the right place > (specific header, body, etc.).
I planned this for 2.1, which has the is_header field. And if we could only search headers, we would do great with regexp alone - for a regexp can include something like "\nFrom:" . Besides, the presence of is_header would make a regexp header search much faster (still not nearly as fast as the two- or three-table solution, of course, but at least we won't have to search through the bodies). We could introduce is_header usage even in 2.0.x, because the field is present in 2.0. But is a database upgrade script acceptable, even if it only fills in an already-present field? Yours, Mikhail Ramendik