[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Michael McCandless (JIRA) Tue, 11 Sep 2012 11:28:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453262#comment-13453262
 ]


Michael McCandless commented on LUCENE-2684:
--------------------------------------------

The problem is that "scoresDocsInOrder" doesn't really capture what's necessary 
here (yes, it works today, but, not necessarily tomorrow....).

I agree Uwe: if we add a Collector.needsNavigation() then even a "fixed" BS1 
that sorted the docIDs before collection would not be usable since the subs 
will not be "on" the doc during collect().

And I agree Robert: the current booleans "topLevelScorer" and 
"scoreDocsInOrder", and then a new "needsNavigation", will make things rather 
confusing.  Really I think topLevelScorer should be strongly typed: the intent 
is to declare whether you will call Scorer.score(Collector) or whether you will 
call .nextDoc()/.score() ... they really should be different classes.

If we don't think any other future scorer would want to score docs NOT in order 
... then maybe we should simple rename scoreDocsInOrder to needsNavigation?  
(Or scoreDocAtOnce, scoreDocAtATime, something else...).
                
> it's not possible to access sub-query's freq information if BooleanScorer is 
> use
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-2684
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2684
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>            Reporter: Michael McCandless
>             Fix For: 4.1
>
>
> LUCENE-2590 added an advanced feature, allowing an app to gather all 
> sub-scorers for any Query.
> This is powerful because then, during collection, the app can get some 
> details about how each sub-query "participated" in the overall match for the 
> given document.
> However, I think this is completely broken if the BooleanQuery uses 
> BooleanScorer, because that scorer is not doc-at-once.  Instead, it batch 
> processes chunks of 2048 sequential docIDs per scorer.  This is a big 
> performance gain, but it means that the sub scorers will all be positioned to 
> the end of the 2048 doc chunk while the docs that matched within that chunk 
> are collected.
> I don't think we can easily fix this... likely the "fix" is to make it 
> easy(ier) to force BQ to use BooleanScorer2 (which is doc-at-once)?  It is 
> actually possible to force this, today, by having your collector return false 
> from acceptDocsOutOfOrder...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2684) it's not possible to access sub-query's freq information if BooleanScorer is use

Reply via email to