Hi, I'm struggling to create a performant query in Lucene 3.0.0 in which I want to combine 'regular' scoring with scores derived from external sources.
For each document a fixed set of scores is calculated in the range [0.0, 1.0>. These scores represent the confidences that a document falls into categories. So for example document #1 has a score of 0.3 for cat=boys, 0.2 for cat=girls, 0.1 for cat=toys, 0.05 for cat=animals. The 'regular' scoring is calculated using a BooleanQuery with TermQuerys similar to: -type:H +(title:dna body:dna^1.5) In the current naive approach I'm combining the scores as following: - for each document store the three best categories in the following fields: name=cat1st value=boys fieldboost=0.3 name=cat2nd value=girls fieldboost=0.2 name=cat3rd value=toys fieldboost=0.1 Search-time use the following query if you're interested in 'girls': -type:H +(title:dna body:dna^1.5) cat1st:girls cat2nd:girls cat3rd:girls or if you're interested in 'boys': -type:H +(title:dna body:dna^1.5) cat1st:boys cat2nd:boys cat3rd:boys Disadvantages of the current approach: - loss of precision encoding/decoding boosts (performance is important, so this might be acceptable) - using TermQuery for the cat fields doesn't make a lot of sense since the external scores are multiplied by the idf of 'boys'/'girls' and the querynorm - the resulting score from the cat field is added to the other query score instead of multiplied Just to give you an idea: the index I'm using is growing in time and contains about 50 million documents Do you have an idea how I can improve my query and still keep high performance? Or should I combine the scores in the Collector (but this doesn't seem the right place to retrieve the category scores from the index)? Is it possible to use a different float->byte encoder per field to reduce the lack of precision? Thanks for your time, Dennis --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org