Mark,

On Thursday 23 December 2004 22:20, markharw00d wrote:
> Thanks for the suggestions, Paul.
> 
> I've just tried a scheme using the max docFreq of the expanded terms as 
> the docFreq shared by all expanded terms in their idf calculations 
> (giving a lower, shared, IDF) and I'm still removing the coordination 
> factor on the BooleanQuery that groups the term queries..
> Results seem much more sensible than the existing way of handling fuzzy 
> queries. Here are some example results:

That's quick. Do you have a time shrinking machine there?

> 
> Query: smith~
> ==============
> New scheme top result: Smith Smith
> New scheme top score: 1.0
> Existing scheme top result: Smita Khurana
> Existing scheme top score: 0.02
> 
> 
> Query: pete~ smith~
> ==============
> New Scheme top result: Peter Smith
> New Scheme top score: 0.99
> Existing Scheme top result: Morrissey Pete
> Existing Scheme top score: 0.07
> 
> Query: David Harland~
> ==============
> New scheme top result: David Harland
> New scheme top score: 0.68
> Existing scheme top result: David Burland
> Existing scheme top score: 0.18
> 
> 
> I've currently amended FuzzyQuery to create new subclasses of 
> BooleanQuery and TermQuery which override the similarity methods coord 
> (for BooleanQuery) and idf ( for TermQuery). This approach will need to 
> be taken by the other multi-term queries.
> Does this sound like the best way to do this?

The results look pretty good and it sounds like the code is compact.
What more could one wish?

Does it also do summing before tf()? That would make it perfect, I think,
but it may be somewhat harder to implement. Summing before
tf() is useful in documents that have more than one variation
of the expanded term.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to