Lucyans,

Marvin and I were just discussing the QueryParser on IRC. Years ago, I reported 
a bug in the KinoSearch query parser:

  http://www.rectangular.com/pipermail/kinosearch/2006-May/004992.html

Basically, if I searched on "PHP::Interpreter", the parser died. Marvin fixed 
this bug, and I think partly as a result of this, introduced the `heed_colons` 
attribute that persists today in Luncy::Search::QueryParser. But as I 
understand it, `heed_colons` has three issues:

1. It adds complexity to the parser (simpler is better).
2. It has a security vulnerability: If a user searches on "secret_field:foo", 
it will search only secret_field, and you might not want that.
3. If a field doesn't exist, the results may be meaningless.

In discussing these issues with Marvin, he expressed a strong desire not to get 
into QueryParser wars, and I can understand that. I think that one of the 
strengths of Lucy is that the default QueryParser offers a decent 80% solution 
for most users, while offering the power of toolkit hackers to do even more. 
With that in mind, I think we've come up with a solution to the above issues 
that actually *simplifies* QP a bit:

* Deprecate heed_colons. Always heed colons.
* If you search for "foo:bar" and the field "foo" doesn't exist or is not 
public, treat it as a term.

So addressing the above three points, this change would:

1. Remove complexity (or at least deprecate it)
2. Prevent private fields from being searched
3. Return relevant results when a colon term does not match a public field.

As a result "module:PHP::Interpreter" will properly search "PHP OR Interpreter 
IN module" and "PHP::Interpreter" will search "PHP OR Interpreter", and 
"secret_filed:whatever" will search "secret OR field OR "whatever".

Thoughts?

Best,

David

Reply via email to