On Apr 15, 2011, at 10:02 PM, Marvin Humphrey wrote: > 1. Generate a TermQuery with field 'foo' and term 'bar'. This is the > current behavior, which we are ruling out because it makes it hard to > write a secure parser when you have sensitive fields. > 2. Treat 'foo' as a distinct term, so that the query is parsed the same as > 'foo bar'. > 3. Treat 'foo:bar' as a single "leaf", which will then be expanded by > Expand_Leaf() and will be tokenized using field-specific Analyzers. > Most of the time, this will result in a PhraseQuery, as if you had typed > '"foo bar"'. > 4. Generate a NoMatchQuery. > > Whatever option we choose, I hope that the parser can produce Queries which > return sensible results for all of these: > > http://www.apache.org/ > mailto:[email protected] > PHP::Interpreter > 10:30 > > (Can others suggest more torture test query strings?)
Those are great examples. And given those, I think #3 is probably the best choice. In all those cases, with the possible exception of mailto:, a phrase is what I would expect. > Our QueryParser, unlike the Lucene QueryParser, is primarily designed as a > user-facing parser -- it never throws parse errors, it supports only widely > popular syntax, etc. Options 2 and 3 are similar to what you get at Google > today[1], and they are in the tolerant spirit of the current design. > However, they are somewhat inconsistent from an interface design standpoint, > and I worry that that makes QueryParser harder to grok and subclass. This is largely a matter of precise documentation and a good API, though, yes? Also, is there a strict, Lucene-style parser? > 0.2.0 > * Always heed colons. > * Make QParser_Set_Heed_Colons() a no-op and deprecate it in the > documentation. > 0.3.0 > * Remove QParser_Set_Heed_Colons(). And at what point would the application of one of the above four solutions be applied? I can see arguments for 0.1.0 and 0.2.0. Best, David
