[jira] [Comment Edited] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Uwe Schindler (JIRA) Tue, 11 Sep 2012 11:22:09 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453245#comment-13453245
 ]


Uwe Schindler edited comment on LUCENE-2684 at 9/12/12 5:20 AM:
----------------------------------------------------------------

It does not only affect freq(). In my case it was "retrieving the subquery 
score"...

bq. EG someday we could make BS1 score docs in order (it is possible, just not 
sure it'd be performant), and then this workaround no longer works.

But with in-order scoring we are in all cases use correctly positioned scorers, 
otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we 
fixed recently). So returning "false" works around the issue currently, but it 
would not hurt if somebody would return false, although our new BS1 can handle 
in order. But on the other hand, if BS1 would score in order, but not position 
sub-scorers correctly it is clearly a bug!
                
      was (Author: thetaphi):
    It does not only affect score(). In my case it was "retrieving the subquery 
score"...

bq. EG someday we could make BS1 score docs in order (it is possible, just not 
sure it'd be performant), and then this workaround no longer works.

But with in-order scoring we are in all cases use correctly positioned scorers, 
otherwise it is a bug (like the DisjunctionSumScorer bug in 3.6 and 4.0 we 
fixed recently). So returning "false" works around the issue currently, but it 
would not hurt if somebody would return false, although our new BS1 can handle 
in order. But on the other hand, if BS1 would score in order, but not position 
sub-scorers correctly it is clearly a bug!
                  
> it's not possible to access sub-query's freq information if BooleanScorer is 
> use
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-2684
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2684
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>            Reporter: Michael McCandless
>             Fix For: 4.1
>
>
> LUCENE-2590 added an advanced feature, allowing an app to gather all 
> sub-scorers for any Query.
> This is powerful because then, during collection, the app can get some 
> details about how each sub-query "participated" in the overall match for the 
> given document.
> However, I think this is completely broken if the BooleanQuery uses 
> BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
> processes chunks of 2048 sequential docIDs per scorer.  This is a big 
> performance gain, but it means that the sub scorers will all be positioned to 
> the end of the 2048 doc chunk while the docs that matched within that chunk 
> are collected.
> I don't think we can easily fix this... likely the "fix" is to make it 
> easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
> actually possible to force this, today, by having your collector return false 
> from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Reply via email to