On Thu, Dec 01, 2011 at 12:07:47PM +0200, goran kent wrote: > The page at > http://incubator.apache.org/lucy/docs/perl/Lucy/Plan/FieldType.html > is a bit sparse on detail about the boost property. > I'd like to get a better understanding of how and by how much it's > value influences score (rank) in search results - what's the formula > used when boost is applied to a document's score?
It's pretty complicated. Field boost, document boost, and field length normalization are all consolidated, then they are reduced down to a single 8-bit float with a 3-bit mantissa and a 5-bit exponent. Because of the coarseness of the lossy data compression, small changes to boost may not even move the needle. I wouldn't bother with a field or document boost multiplier that doesn't change things by at least a factor of 2. It's theoretically possible to calculate ceiling and floor values for boost, but I don't know what the answers are. > Finally, what are reasonable values (upper/lower) for boost when, in > my case eg, I'd like to influence the score based on an external value > (page rank), but not have my page rank completely skew the scores - > just enough to promote pages which have an organic page rank value > which should be considered to some degree (a very broad subject, I > know). Subtle rerankings are problematic because search engines are noisy. Even the best ones give you a bunch of junk you don't need. We don't really care about fine distinctions, because if you sample a handful of documents with identical scores, odds are that they are *wildly* divergent in terms of what the user wants. We only care about big differences. > My tests so far show that a boost value with a small variance in the > mantissa has an almost zero influence on score/ranking. My thinking > is to boost with something akin to $boost+=LogN(PR) - ie between 0-10 > (log scale). So this boils down to: is using a scale of 1-10 a good > idea w.r.t. the Lucy boost property to influence ranking, or 10x that > value? I'd try 1-100. If that's too much, scale it back. Marvin Humphrey
