RE: Odd Boolean scoring behavior?

karl.wright Thu, 20 Jan 2011 14:31:19 -0800

The original query is fine, and has the boost as expected:

((+language:eng +(
    CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
...
    CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 
+value_7:hillmonument~0.8332333)^0.85714287) 
    CutoffQueryWrapper((+value_7:bunker~0.8332333 
+othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0)
(
    CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.6666667)
...
))


The rewritten query is odd.  Here's a sample:


((+language:eng +(
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286)

...

    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287)))^3.0)
(
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
...
    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287) 
    CutoffQueryWrapper(+() +(()^0.6666667)) 
    CutoffQueryWrapper((+() +(()^0.6666667))^0.85714287) 
    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287)
)

As you can see, there are a lot of repeats, a lot of blank matches, but the 
original boost *is* still there.  I really can't interpret this any further - 
the many blank and repeated matches seem wrong to me, but the scorer 
explanation seems even more wrong.  Any ideas?

Karl


-----Original Message-----
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 3:34 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 3:06 PM,  <karl.wri...@nokia.com> wrote:
> I tried commenting out the final OR term, and that excluded all records that 
> were out-of-language as expected.  It's just the boost that doesn't seem to 
> work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



> Exploring the explain is challenging because of its size, but there are NO 
> boosts recorded of the size I am using (10.0).  Here's the basic structure of 
> the first result.
>
> 0.0 = (MATCH) sum of:
>  0.0 = (MATCH) sum of:
>    0.0 = (MATCH) weight(language:eng in 52867945), product of:
>      0.0 = queryWeight(language:eng), product of:
>        1.0 = idf(docFreq=23889670, maxDocs=59327671)
>        0.0 = queryNorm
>      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
>        1.0 = tf(termFreq(language:eng)=0)
>        1.0 = idf(docFreq=23889670, maxDocs=59327671)
>        1.0 = fieldNorm(field=language, doc=52867945)
>    0.0 = (MATCH) product of:
>      0.0 = (MATCH) sum of:
>        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
> othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
> othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
> othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
>          1.0 = boost
>          0.0 = queryNorm
>        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
> value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
> value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
> value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
> value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
> value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
> value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
> value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
> value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
> value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
> value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
>          1.0 = boost
>          0.0 = queryNorm
>
> ...
>
>      0.0069078947 = coord(21/3040)
>  0.0 = (MATCH) product of:
>    0.0 = (MATCH) sum of:
>      0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
> othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
> othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
> othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
>        1.0 = boost
>        0.0 = queryNorm
>      0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
> value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
> value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
> value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
> value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
> value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
> value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
> value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
> value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
> value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
> value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
>        1.0 = boost
>        0.0 = queryNorm
>
> ...
>
>    0.0069078947 = coord(21/3040)
>
> It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
> does not actually apply boost?
>
> Karl
>
>
>
> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik 
> Seeley
> Sent: Thursday, January 20, 2011 2:36 PM
> To: dev@lucene.apache.org
> Subject: Re: Odd Boolean scoring behavior?
>
> On Thu, Jan 20, 2011 at 2:17 PM,  <karl.wri...@nokia.com> wrote:
>> The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
>> effect.  I can change it all over the place, and nothing much changes.
>
> Then perhaps your language term doesn't actually match anything in the
> index?  (i.e. how is it analyzed?)
> Next step would be to get score explanations (just add debugQuery=true
> if you're using Solr, or see IndexSearcher.explain() if not).
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Odd Boolean scoring behavior?

Reply via email to