: I'm hoping that coord similarity API can be changed from: : float coord(int overlap, int maxOverlap) ... : float coord(int overlap, int maxOverlap, int docSize)
that's a pretty significant change ... especally considering Lucene doesn't know the docSize. you my want to review the comments in another recent related thread that suggested incorperating the average doc Length... http://www.nabble.com/search-quality---assessment---improvements-tf3974580.html#a11701392 : score. Nothing can help here, changing lengthNorm to intentionally lower : the score of car names as they get longer doesn't make sense, the "Volvo V70 : Wagon Luxury Edition Sports Pacakge AWD" is just as much of a car as the the long name may be "just as much of a car" as the short name, but the lengthNorm by itself isn't really important -- it's all relative, the lengthNorm is just there to help offset other factors such as higher tfs and in the case of larger boolean queries: a higher coord factor. Regarding your specific problem: other people have solved this using PhraseQueries with extermely large slop, and sentinal terms indexed at the start and end of their field values. ie... Doc1: _START_ Volvo V70 Wagon _END_ Doc2: _START_ Volvo V70 Wagon Luxury Edition Sports Pacakge AWD _END_ User Input: Volvo V70 Wagon Query: SpanNearQuery(_START_, Volvo, V70, Wagon, _END_, 10000) ...both docs will match, Doc1 will match with a much higher score. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]