you could, but is it worth doing it yourself when MySQL is building it for you?
http://www.mysql.com/doc/en/Fulltext_Search.html
From the top of this page:
6.8 MySQL Full-text Search
As of Version 3.23.23, MySQL has support for full-text indexing and searching. Full-text indexes in MySQL are an index of type FULLTEXT. FULLTEXT indexes are used with MyISAM tables only and can be created from CHAR, VARCHAR, or TEXT columns at CREATE TABLE time or added later with ALTER TABLE or CREATE INDEX. For large datasets, it will be much faster to load your data into a table that has no FULLTEXT index, then create the index with ALTER TABLE (or CREATE INDEX). Loading data into a table that already has a FULLTEXT index could be significantly slower.
Moreover, mail messages will be a undetermined variable length. Can MySQL support a 32-bit VARCHAR? What about type TEXT? Or 8-bit or even 16-bit character sets? Since you might be storing a lot of MIME bodypart types, can it handle BLOBs, and can it handle them well? Or, do you do parsing within your archive application and store the entire message somewhere outside of the database, while storing a FULLTEXT index of only the bodypart types you declare to be human-readable?
What if you want to do a case-sensitive search? In that case, it doesn't look like FULLTEXT or MATCH will do you any good, since MATCH is declared to be case-insensitive. Or what if you want to search for hyphenated literals? It seems that MATCH considers them to be word breaks even within literal searches.
If you were just storing into a TEXT and then doing SELECT LIKE into it, I'd agree with you. But MySQL is doing interesting things here. Why not leverage it?
I'm not sure it really helps in this case. I'm not sure it can handle the amounts of data that might need to be stored into a field, or the different character sets that might need to be used. I'm also concerned about what using this function might do to the overall speed and size of the database.
On the page quoted above, look for benchmark data reported by Jim Nguyen and John Takacs. Two million rows with text and multiple word searches (three or more) taking 30-seconds to a minute to complete, is not good performance. Three to five million rows, with searches taking 50 seconds or more for single words, is not good performance.
Now, consider how many words might be in a single message (hundreds to thousands or even tens of thousands), and how many messages might be in a single archive (thousands to millions). If each message was contained within a row, this would be dead-Universe slow.
-- Brad Knowles, <[EMAIL PROTECTED]>
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania.
GCS/IT d+(-) s:+(++)>: a C++(+++)$ UMBSHI++++$ P+>++ L+ !E-(---) W+++(--) N+ !w--- O- M++ V PS++(+++) PE- Y+(++) PGP>+++ t+(+++) 5++(+++) X++(+++) R+(+++) tv+(+++) b+(++++) DI+(++++) D+(++) G+(++++) e++>++++ h--- r---(+++)* z(+++)
_______________________________________________ Mailman-Developers mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/mailman-developers