I am working currently on https://bugs.launchpad.net/launchpad/+bug/1020443 (Full text search broken for certain search terms having a "bad combination" of punctuation characters, like "?!.". This a fallout bug from my previous work on another full text search related bug: https://bugs.launchpad.net/launchpad/+bug/29713 )
As explained by stub in a comment, the stored procedure ftq() does no longer I see two options to fix this bug: (A) We can either fix the immediate problem (the fix would be quite simple) and keep the feature "treat the characters '&|!' as logical operators in full text searches". (B) Let ftq() simply remove "&|!" from queries. I prefer the option (B): Launchpad is about software development, so one option to find a bug is to paste a source code line as a search query. This can lead to odd results: - a search for "#!/bin/sh" will find texts that _do not_ contain the word "/bin/sh" because '!' is the logical "not" operator. - a search for a source code line like foo = bar & !(baz | blah) will not find a text containing the very same line: The full text index contains the four words "foo", "bar", "baz", "blah", but the query is "find texts containing 'foo' and 'bar' but not containing 'baz' or 'blah'... Note that we already support the alternative search operators "AND", "OR", "NOT". (Only the upper case words are treated as logical search operators.) Here we don't have that many conflicts: These words are stop words and thus not indexed, hence treating them in a special way does not affect most queries -- with one likely exception: SQL queries pasted as a a full text search query. But this "SQL search problem" is not new... A related problem: If a word has a '-' as a prefix, the '-' is also treated as a NOT operator. That does not cause any harm for words, but breaks for numbers. The Postgres full text indexer parses numbers different from words: select to_tsvector('abc -def'); to_tsvector ----------------- 'abc':1 'def':2 select to_tsvector('123 -456'); to_tsvector ------------------ '-456':2 '123':1 so a search for "-456" won't find a text containing "123 -456" because the query "-456" is converted into 'NOT 456'... Hence I think we should no longer treat a leading '-' like a NOT. cheers Abel _______________________________________________ Mailing list: https://launchpad.net/~launchpad-dev Post to : launchpad-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~launchpad-dev More help : https://help.launchpad.net/ListHelp