From: "Freddie Bingham" <[EMAIL PROTECTED]> > > (1) Why is such a query as this allowed? > > > > WHERE MATCH (text) > > AGAINST ('+s*' IN BOOLEAN MODE) > > > > This returns everything that starts with an 's', as expected > > by the syntax. > > Why is it not stopped by the mysql minimum word length? I
The minimum word length still applies to the items that are found! A record with "so very cruel" in the `text` column will certainly not be found. > > have no desire to allow a search for every word that begins > > with an 's' as it can be resourceful and will return results > > with questionable usage. So I'll take care of not allowing > > these searches in a higher level It is always wise to check user input before applying it to a query! > > (2) Why does this query also return everything that begins with an 's' > > > > WHERE MATCH (text) > > AGAINST ('[EMAIL PROTECTED]' IN BOOLEAN MODE) > > > > Mysql is obviously silently stripping the list of garbage > > characters away from the word, which I assume was also done > > with the indexed list of words. See: http://dev.mysql.com/doc/mysql/en/fulltext-search.html ---------- MySQL uses a very simple parser to split text into words. A ``word'' is any sequence of true word characters (letters, digits, and underscores), optionally separated by no more than one sequential ''' character. For example, wasn't is parsed as a single word, but wasn''t is parsed as two words wasn and t. (And then t would be ignored as too short according to the rules following.) Also, single quotes at the ends of words are stripped; only embedded single quotes are retained. ----------- As documented, MySQL will 'ignore' those characters while building the index as they are not part of a 'word' ('word' as defined in the paragraph I quoted). As far as the search 'word' is concerned, MySQL supports a limited list of operator, so you can assume that everything else is "silently" ignored... > > I now have a serious issue since I what appears to be a long > > word, but is really only one character long. Putting quotes > > around the word seems to force it to be taken literally but > > that also takes the * literally, breaking the wildcard > > search. Where can I find a list of these characters that are > > silently stripped out? http://dev.mysql.com/doc/mysql/en/fulltext-boolean.html lists the operators that are supported: + - > < () ~ * "" > > WHERE MATCH (text) > > AGAINST ('[EMAIL PROTECTED]' IN BOOLEAN MODE) > > > > This returns all results that begin with a 't', but not 's'! This will probably be equal to : '+t* s*' or: must contain word starting with t, possible containing word(s) starting with s. Which is reflected in these results: text relevance passe 0 Type error 1 type sweetheart! 1.33333333730698 supergranny 0 > > then this query: > > > > WHERE MATCH(text) AGAINST ('[EMAIL PROTECTED]' IN > > BOOLEAN MODE) '+ qwerty* s* t*' The first + is not leading a word, so it will be ignored: 'qwerty* s* t*' So sweet sour set sells sins per se 1 so 0 passe 0 Type error 1 type sweetheart!! 2 supergranny 1 qwertyuiop 1 qwertyuiop same type 2 > > Matches all words that beginning with a 's' or a 't'! or beginning with 'qwerty'!!!! > > Anyone have a concise explanation just exactly how the > > wildcard character works in terms of real world strings such as this? Hopefully the examples above are clear enough. If you want to only support searches for words longer than say three characters it would be wise to: - only allow alphanumeric characters, single quotes, white space and supported operators - remove anything else and check for length of 'words' If the entry passes these checks you can probably apply it to a query (after escaping quotes!!). The checks can be performed fairly easy if you use some kind of regular expressions in the application that uses the query. Regards Jigal. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]