Lucyans, Marvin and I were just discussing the QueryParser on IRC. Years ago, I reported a bug in the KinoSearch query parser:
http://www.rectangular.com/pipermail/kinosearch/2006-May/004992.html Basically, if I searched on "PHP::Interpreter", the parser died. Marvin fixed this bug, and I think partly as a result of this, introduced the `heed_colons` attribute that persists today in Luncy::Search::QueryParser. But as I understand it, `heed_colons` has three issues: 1. It adds complexity to the parser (simpler is better). 2. It has a security vulnerability: If a user searches on "secret_field:foo", it will search only secret_field, and you might not want that. 3. If a field doesn't exist, the results may be meaningless. In discussing these issues with Marvin, he expressed a strong desire not to get into QueryParser wars, and I can understand that. I think that one of the strengths of Lucy is that the default QueryParser offers a decent 80% solution for most users, while offering the power of toolkit hackers to do even more. With that in mind, I think we've come up with a solution to the above issues that actually *simplifies* QP a bit: * Deprecate heed_colons. Always heed colons. * If you search for "foo:bar" and the field "foo" doesn't exist or is not public, treat it as a term. So addressing the above three points, this change would: 1. Remove complexity (or at least deprecate it) 2. Prevent private fields from being searched 3. Return relevant results when a colon term does not match a public field. As a result "module:PHP::Interpreter" will properly search "PHP OR Interpreter IN module" and "PHP::Interpreter" will search "PHP OR Interpreter", and "secret_filed:whatever" will search "secret OR field OR "whatever". Thoughts? Best, David
