This question probably belongs on java-user@, not gene...@.
That said, coord() might be what you're looking for: http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/Similarity.html#coord%28int,%20int%29 Doug tavi.nathanson wrote:
Hey everyone, Let me start with an example query: [apple orange banana] I would like to heavily boost documents containing a greater number of unique query terms (apple, orange, banana), without MUST'ing the terms; in other words, a document containing just 2 unique terms (apple, banana) should have a higher score than a document containing 10 or 20 of the same term (10 apple's). I'm using SHOULD right now, and TF is defeating me; documents containing a ton of the *same* term are overpowering documents with a few unique terms. Is there a standard way to accomplish what I'm looking for? I can think of several hacks, but I don't really like them: - I can do a union of query with MUST and a query with SHOULD, and boost the MUST part, but that doesn't help me with a document that contains apple and banana (but not orange). - Perhaps I could lower the impact of TF (although I'm not sure what the best way of doing this would be). Thanks so much!
