Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

Doug Cutting Fri, 14 Nov 2003 13:36:53 -0800

I'm still confused what the issue here is. If you're interested in stopping exact phrase matching from crossing sentence boundaries, that's easy to do with setPositionIncrement(). If you want to score an exact phrase match higher than a sloppy phrase match, and in turn score this higher than a non-phrasal match, this is all easy to do with Lucene. Depending on how much control you need, you could do it with a single sloppy phrase query (perhaps with a custom Similarity implementation), or perhaps you'd need to combine an OR query of the terms with an exact phrase match and with a sloppy phrase match, each with different boosts.

Certainly there are lots of scoring algorithms that one cannot easily implement with Lucene. I'm just not yet clear on what you need to do that Lucene cannot support.

Cheers,

Doug

Chong, Herb wrote:

then i wouldn't have typed capital gains tax. there is psychology of query creation too and that is one thing i am taking advantage of.

Herb....
-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Friday, November 14, 2003 3:15 PM
To: Lucene Users List
Subject: Re: inter-term correlation [was Re: Vector Space Model in Lucene?]
Have sentence boundaries actually proven to be that userful in this sort of thing. For example, if the text were something like:

"Such sales would be considered long term capital gains. Tax on these is 20%."

Then penalizing for the sentence boundary wouldn't be valid.

Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: inter-term correlation [was Re: Vector Space Model in Lucene?]

Reply via email to