So I think I understand where the blank values and repeats come from.  Those 
are the expansions of fuzzy queries against fields that have no matches 
whatsoever for the fuzzy values in question. So those are indeed OK.

I guess then that the problem is that the scoring explanation makes no sense.  
I'm going to pick that apart and see why not next.

Karl

-----Original Message-----
From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] 
Sent: Thursday, January 20, 2011 5:31 PM
To: dev@lucene.apache.org; yo...@lucidimagination.com
Subject: RE: Odd Boolean scoring behavior?

The original query is fine, and has the boost as expected:

((+language:eng +(
    CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_1:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +othervalue_0:hill)^0.5714286)
...
    CutoffQueryWrapper((+othervalue_7:bunker~0.8332333 
+value_7:hillmonument~0.8332333)^0.85714287) 
    CutoffQueryWrapper((+value_7:bunker~0.8332333 
+othervalue_7:hillmonument~0.8332333)^0.85714287)))^3.0)
(
    CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 
+value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_0:bunker~0.8332333 
+othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+value_1:bunker~0.8332333 +value_0:hill)^0.6666667)
...
))

The rewritten query is odd.  Here's a sample:


((+language:eng +(
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286)

...

    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287)))^3.0)
(
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286) 
    CutoffQueryWrapper((+() +othervalue_0:hill)^0.5714286) 
    CutoffQueryWrapper((+(value_2:bunker value_2:burker^5.997396E-4) 
+value_0:hill)^0.6666667) 
    CutoffQueryWrapper((+() +value_0:hill)^0.5714286)
...
    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287) 
    CutoffQueryWrapper(+() +(()^0.6666667)) 
    CutoffQueryWrapper((+() +(()^0.6666667))^0.85714287) 
    CutoffQueryWrapper((+() +(()^0.5555556))^0.85714287)
)

As you can see, there are a lot of repeats, a lot of blank matches, but the 
original boost *is* still there.  I really can't interpret this any further - 
the many blank and repeated matches seem wrong to me, but the scorer 
explanation seems even more wrong.  Any ideas?

Karl


-----Original Message-----
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, January 20, 2011 3:34 PM
To: dev@lucene.apache.org
Subject: Re: Odd Boolean scoring behavior?

On Thu, Jan 20, 2011 at 3:06 PM,  <karl.wri...@nokia.com> wrote:
> I tried commenting out the final OR term, and that excluded all records that 
> were out-of-language as expected.  It's just the boost that doesn't seem to 
> work.

I see a lot of unexpected zeros - queryNorm has factors if idf and the
boost in it - the fact that it's 0 suggests that you used a 0 boost.

Why don't you do a toString() on your query and see if it's what you expect.

-Yonik
http://www.lucidimagination.com



> Exploring the explain is challenging because of its size, but there are NO 
> boosts recorded of the size I am using (10.0).  Here's the basic structure of 
> the first result.
>
> 0.0 = (MATCH) sum of:
>  0.0 = (MATCH) sum of:
>    0.0 = (MATCH) weight(language:eng in 52867945), product of:
>      0.0 = queryWeight(language:eng), product of:
>        1.0 = idf(docFreq=23889670, maxDocs=59327671)
>        0.0 = queryNorm
>      1.0 = (MATCH) fieldWeight(language:eng in 52867945), product of:
>        1.0 = tf(termFreq(language:eng)=0)
>        1.0 = idf(docFreq=23889670, maxDocs=59327671)
>        1.0 = fieldNorm(field=language, doc=52867945)
>    0.0 = (MATCH) product of:
>      0.0 = (MATCH) sum of:
>        0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
> othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
> othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
> othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
>          1.0 = boost
>          0.0 = queryNorm
>        0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
> value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
> value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
> value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
> value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
> value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
> value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
> value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
> value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
> value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
> value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
>          1.0 = boost
>          0.0 = queryNorm
>
> ...
>
>      0.0069078947 = coord(21/3040)
>  0.0 = (MATCH) product of:
>    0.0 = (MATCH) sum of:
>      0.0 = (MATCH) CutoffQueryWrapper((+(othervalue_5:banker^5.997396E-4 
> othervalue_5:bucker^5.997396E-4 othervalue_5:bunder^5.997396E-4 
> othervalue_5:bunker othervalue_5:bunner^5.997396E-4 
> othervalue_5:burker^5.997396E-4) +value_5:hill)^0.5714286), product of:
>        1.0 = boost
>        0.0 = queryNorm
>      0.0 = (MATCH) CutoffQueryWrapper((+(value_5:banker^5.997396E-4 
> value_5:baunker^5.997396E-4 value_5:benker^5.997396E-4 
> value_5:beunker^5.997396E-4 value_5:binker^5.997396E-4 
> value_5:bonker^5.997396E-4 value_5:brunker^5.997396E-4 
> value_5:bucker^5.997396E-4 value_5:bueker^5.997396E-4 
> value_5:bunder^5.997396E-4 value_5:bunger^5.997396E-4 
> value_5:bunkek^5.997396E-4 value_5:bunken^5.997396E-4 value_5:bunker 
> value_5:bunkers^5.997396E-4 value_5:bunkeru^5.997396E-4 
> value_5:bunner^5.997396E-4 value_5:bunter^5.997396E-4 
> value_5:bunzer^5.997396E-4 value_5:burker^5.997396E-4 
> value_5:busker^5.997396E-4) +othervalue_5:hill)^0.5714286), product of:
>        1.0 = boost
>        0.0 = queryNorm
>
> ...
>
>    0.0069078947 = coord(21/3040)
>
> It looks like the PRODUCT_OF and SUM_OF, which represents the Boolean logic, 
> does not actually apply boost?
>
> Karl
>
>
>
> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik 
> Seeley
> Sent: Thursday, January 20, 2011 2:36 PM
> To: dev@lucene.apache.org
> Subject: Re: Odd Boolean scoring behavior?
>
> On Thu, Jan 20, 2011 at 2:17 PM,  <karl.wri...@nokia.com> wrote:
>> The problem is that the LANGUAGE_BOOST boost doesn't seem to be having any
>> effect.  I can change it all over the place, and nothing much changes.
>
> Then perhaps your language term doesn't actually match anything in the
> index?  (i.e. how is it analyzed?)
> Next step would be to get score explanations (just add debugQuery=true
> if you're using Solr, or see IndexSearcher.explain() if not).
>
> -Yonik
> http://www.lucidimagination.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to