hello, > i can think of two possibilities you might be refering to when you say > "noise" ... one is that the lengthNorm for docs with many variant > titles causes matches in those titles to not score as well as > documents with only one title -- this can be dealt with by overriding > the lengthNorm function so that when used on your title field, it > returns a constant value (or by index your title Field with > setOmitNorms(true)) ... this will completley eliminate the length of > all titles from scoring consideration, but it is an option.
yes - i guess this is more or less what i mean. an example are the two documents: 1 - with the titles: "http" "hypertext transfer protocol" 2 - with the title: "http tunnel" when i use multi-valued fields and do a search on "http" the title score on the second document is higher as there is a match and the length is shorter. as the first title of the first document would be a perfect match this one should get the higher score instead. disabling the length normalization sounds good - while it may not help to find the more relevant title at least it won't give a bad score to good titles. > As far as I can figure, your best bet really is to use a seperate > field for each title -- but instead of combining the queries into a > BooleanQuery, use a DisjunctionMaxQuery with a tiebreaker value of > 0.0f ... the whole purpose of that query is to score documents based > on the maximum score of a sub query, but you still have to make the > sub queries your self, and there's no easy way to make a query that > only searches the first "chunk" of terms from a field. ok. this could also be a possible solution. i'll evaluate both methods and use the "better" one. thanks for the help. /gst
signature.asc
Description: Digital signature