Erik Hatcher wrote:
I'd like to revisit this issue. First, I add the path field to the Document in this way:

doc.add(Field.Keyword("path", path));

This field is, of course, not tokenized by the Analyzer, right? So shouldn't QueryParser take this fact into account on a field-by-field basis such that a query of "path:/whatever/blah" not be tokenized?
The problem is that you could also add tokenized fields to the field named "path". Lucene doesn't require all terms in a field to be either tokenized or not tokenized: rather they can be mixed in every field. So there's no way at present for a query parser to know that a field only contains un-tokenized values.

We could change things, so that the index did restrict a given field so that all of its values are either tokenized or untokenized, but that would break applications which (reasonably) do not require this. I think an early version of Lucene had this restriction, but it was subsequently removed.

Is it possible to have this type of smarts in the QueryParser such that it takes the field type into consideration before using an Analyzer?
Currently, the place for these smarts is in the Analyzer itself. An analyzer can return different token stream implementations based on the name of the field passed in.

However, in most cases where this is an issue, the real problem is that folks are placing too much reliance on the query parser. The query parser is designed for user-entered queries. If you're programmatically generating query strings that are then fed to the query parser, then you would be better served by directly constructing queries.

Note that you can mix the use of the query parser with direct query construction, e.g.:

BooleanQuery query = new BooleanQuery();
Query parsedQuery = QueryParser.parse(userQuery, "content", analyzer);
query.add(parsedQuery, true, false);
Query keywordQuery = new TermQuery(new Term("keyword", keyword));
query.add(keywordQuery, true, false);

Doug


--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>



Reply via email to