Re: Query Term Questions

Erik Hatcher Wed, 21 Jan 2004 08:32:32 -0800

On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote:

But doesn't the query itself take this into account?  If there are
multiple matching terms then the overlap (coord) factor kicks in.
TS==>Except that I'd like to be able to choose to do this on a query-by-query basis. In other words, it's desirable that some specific queries significantly increase their discrimination based on this multiple matching, relative to the normal extra boost given by the coord factor. However, I take it from your answer that there's not a way to do this in the query itself (at least using the unmodified, standard Lucene version).

Don't interpret my replies as being absolute here - I'm still learning lots about Lucene and am open to being shown new ways of doing things with it.

Another reply mentioned negative boosting.  Is that not working as
you'd like?
TS==>I've not been able to get negative boosting to work at all. Maybe there's a problem with my syntax. If, for example, I do a search with "green beret"^10, it works just fine. But "green beret"^-2 gives me a ParseException showing a lexical error.

Have you tried it without using QueryParser and boosting a Query using setBoost on it? QueryParser is a double-edged sword and it looks like it only allows numeric characters (plus "." followed by numeric characters). So QueryParser has the problem with negative boosts, but not Query itself.

Sounds like what you're really after is fancier analysis.  This is one
of the purposes of analysis, to do stemming.
TS==>Well, I hope I'm not trying to be fancy. It's just that listing all the different variants, particularly (as in my case) I have to do this for multiple fields, gets tedious and error-prone. The example above is simply one such case for a particular query - other queries may have entirely different desired combinations. Constructing a single stemmer to handle all such cases would be (for me, at least) very difficult. Besides, I tend to stay away from stemming because I believe it can introduce some rather unpredictable side-effects.

I'd still recommend trying some of the other analyzer options out there and seeing if you can tweak things to your liking. This is really the answer for what you are after, I'm almost certain. Good stemmers exist - look at the Porter one or the Snowball ones. Write some test cases to "analyze the analyzer" like I did in my java.net articles - it really will let you experiment with indexing and searching easily.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query Term Questions

Reply via email to