Have you considered the function query stuff? oal.search.function.CustomScoreQuery and friends. If you provide your own CustomScoreQuery implementation you can do scoring however you like.
-- Ian. On Mon, Feb 1, 2010 at 7:08 AM, Dennis Hendriksen <dennis.hendrik...@kalooga.com> wrote: > Hi Steve, > > Thank you for your suggestions. Payloads might indeed help me to > overcome the precision loss problem that I am experiencing right now. I > don't think it will help me with the combining of Lucene scores with > external scores however. > > Is there anyone who has a suggestion how to deal with that? > > Dennis > > On Thu, 2010-01-28 at 13:52 -0500, Steven A Rowe wrote: >> Hi Dennis, >> >> You should check out payloads (arbitrary per-index-term byte[] arrays), >> which can be used to encode values which are then incorporated into >> documents' scores, by overriding Similarity.scorePayload(): >> >> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html#scorePayload%28int,%20java.lang.String,%20int,%20int,%20byte[],%20int,%20int%29> >> >> The Lucene in Action 2 MEAP has a nice introduction to using payloads to >> influence scoring, in section 6.5. >> >> See also this (slightly out-of-date*) blog post "Getting Started with >> Payloads" by Grant Ingersoll at Lucid Imagination: >> >> <http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/> >> >> *Note that since this blog post was written, BoostingTermQuery was renamed >> to PayloadTermQuery (in Lucene 2.9.0+ ; see >> http://issues.apache.org/jira/browse/LUCENE-1827 ; wow - this issue isn't >> mentioned in CHANGES.txt???): >> >> <http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/payloads/PayloadTermQuery.html> >> >> Steve >> >> On 01/28/2010 at 6:01 AM, Dennis Hendriksen wrote: >> > I'm struggling to create a performant query in Lucene 3.0.0 in which I >> > want to combine 'regular' scoring with scores derived from external >> > sources. >> > >> > For each document a fixed set of scores is calculated in the range [0.0, >> > 1.0>. These scores represent the confidences that a document falls into >> > categories. So for example document #1 has a score of 0.3 for cat=boys, >> > 0.2 for cat=girls, 0.1 for cat=toys, 0.05 for cat=animals. >> > >> > The 'regular' scoring is calculated using a BooleanQuery with TermQuerys >> > similar to: -type:H +(title:dna body:dna^1.5) >> > >> > In the current naive approach I'm combining the scores as following: - >> > for each document store the three best categories in the following >> > fields: >> > name=cat1st value=boys fieldboost=0.3 >> > name=cat2nd value=girls fieldboost=0.2 >> > name=cat3rd value=toys fieldboost=0.1 >> > Search-time use the following query if you're interested in 'girls': >> > -type:H +(title:dna body:dna^1.5) cat1st:girls cat2nd:girls cat3rd:girls >> > or if you're interested in 'boys': >> > -type:H +(title:dna body:dna^1.5) cat1st:boys cat2nd:boys cat3rd:boys >> > >> > Disadvantages of the current approach: >> > - loss of precision encoding/decoding boosts (performance is important, >> > so this might be acceptable) >> > - using TermQuery for the cat fields doesn't make a lot of sense since >> > the external scores are multiplied by the idf of 'boys'/'girls' and >> > the querynorm >> > - the resulting score from the cat field is added to the other query >> > score instead of multiplied >> > >> > Just to give you an idea: the index I'm using is growing in time and >> > contains about 50 million documents >> > >> > Do you have an idea how I can improve my query and still keep high >> > performance? Or should I combine the scores in the Collector (but this >> > doesn't seem the right place to retrieve the category scores from the >> > index)? Is it possible to use a different float->byte encoder per field >> > to reduce the lack of precision? >> > >> > Thanks for your time, >> > Dennis >> >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org