Doug,

Your points are well taken and I appreciate your time in replying to this. I'm on the same wavelength with this thinking about QueryParser, and I realize I'm attempting to push it past it designed simplicity. I'm not as knowledgeable (and who is?!) on Lucene's API and design as you and many others here and I've learned a lot in the past couple of days. I'll explore the Analyzer idea of returning different tokenizers based on the field that is being dealt with - that might just be the ticket.

I do (and I haven't thought this through more) think that having a Keyword field stay that way rather than allowing other tokenized text to be added to it is a better way. I could change my mind on that as I evolve my experience with Lucene, and will, of course, have to live with how it is now.

Since you brought up the dates with QueryParser - its implementation seems a bit rough. What's the point in supporting the date ranges with QueryParser if you cannot use a human readable date? Its my understanding that you have to convert a Date to a collatable representation just to use it with QueryParser, right? So its got to be computer generated anyway, so I might as well use the API to construct the query for date ranges. If I'm wrong in my understanding of QueryParser date support, please by all means correct me.

And for the record, I am constructing some queries through QueryParser, and some through the API and gluing them together as a BooleanQuery. My questions here are to increase my understanding of how to use the API more effectively, and leverage what is already easily available. And life is made easier by letting QueryParser take care of much of the dirty work, so you can't blame me for pushing the limits of what it can/should do. :)

Thanks again for your time.

Erik


On Tuesday, December 31, 2002, at 02:51 PM, Doug Cutting wrote:

Doug Cutting wrote:
However, in most cases where this is an issue, the real problem is that folks are placing too much reliance on the query parser. The query parser is designed for user-entered queries. If you're programmatically generating query strings that are then fed to the query parser, then you would be better served by directly constructing queries.
This bears emphasis. Abuse of the query parser may be the single most common source of problems with Lucene. We should probably add guidelines for query parser use to the FAQ and/or query parser documentation.

Some rules of thumb are:

- If you are programmatically generating a query string and then parsing it with the query parser then you should seriously consider building your queries directly with the query API. In other words, the query parser is designed for human-entered text, not for program-generated text.

- Untokenized fields are best added directly to queries, and not through the query parser. If a field's values are generated programmatically by the application, then so should query clauses for this field. Analyzers, like the query parser, are designed to convert human-entered text to terms. Program-generated values, like dates, keywords, etc., should be consistently program-generated.

- In a query form, fields which are general text should use the query parser. All others, e.g., date ranges, keywords, etc. are better added directly through the query API. A field with a limit set of values, that can be specified with a pulldown menu should not be added to a query string which is subsequently parsed, but rather added as a TermQuery clause.

I hope that by saying the same thing several times in slightly different ways folks will get the idea! Of course, these are not absolute rules: there are exceptions. The query parser can do more than it should. But when this is done, problems frequently occur. Caveat emptor.

Doug


--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to