[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Michael McCandless (JIRA) Wed, 17 Jun 2009 03:46:29 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720607#action_12720607
 ]


Michael McCandless commented on LUCENE-1630:
--------------------------------------------

{quote}
bq. Can we make Collector.supportsDocsOutOfOrder abstract? Defaulting to false 
isn't great (I'd rather subclass think about the question).

In general, I tried to avoid it since that would require changing all core 
Collectors. There aren't many, but still ...

This goes for QueryWeight.scoresOutOfOrder - wanted to avoid changing all core 
Weights to impl the method w/ "return false". I actually think that many 
Weights/Scorers do score documents in-order, hence the default impl.
{quote}

OK... thinking more about it, I think having
QueryWeight.scoresDocsOutOfOrder default to "false" is reasonable (I
think most do in-order scoring).  Also, I think the perf gains are
relatively small if a QueryWeight returns "true", so, by defaulting to
false we're not leaving much performance on the table.

But for Collector it's a different story: the gains by allowing
BooleanQuery to use its out-of-order scorer are sizable.  And, I'd
expect many custom Collectors would be fine with out-of-order
collection.

Since these are brand new classes, we have the chance to do it well.
It's very much an expert thing already to make your own Collector...

{quote}
bq. If a given Scorer.scoresOutOfOrder returns true, does that mean nextDoc is 
allowed to return docs out of order?

When you deal with a Scorer which returns out-of-order, you can only call 
scorer.score(Collector). If you're going to iterate, you're going to have to 
create a Scorer in-order, and that's what IndexSearcher does. I'll spell it out 
clearly in the javadocs.
{quote}

That may be a bit too strong -- eg BooleanScorer lets you nextDoc()
your way through its out-of-order docs (just not advance()).  Maybe
state just that you can't use advance in the javadocs?

{quote}
bq. Should scoresOutOfOrder() move from QueryWeight --> Scorer?

We've discussed it few posts up. When this information in in Scorer, I should 
first ask for a Scorer, and only then I can create a Collector. If I'll use the 
Scorer immediately, then that'll be ok. However, that's not the case in 
IndexSearcher, and results in a bug in Spatial, and unless we want to uglify 
IndexSearcher code, it seemed that this can sit in QueryWeight.

But I do think it's a problematic method in QW too, since if it returns false 
by default, I'll create a Collector which expects docs in-order, but then I'd 
lose the optimization in BooleanWeight which may return an out-of-order 
superior Scorer. If I return true, I'll create a Collector which expects 
out-of-order, and the Scorer (again, an example from BW) may be actually 
in-order, and I've wasted unnecessary 'if doc > topDoc' cycles.

So I don't know what's better: make IndexSearcher code more complicated or 
sacrifice a potential loss of this optimization?
{quote}

Could we "invert" the logic in IndexSearcher that makes a collector,
eg by creating a utility class that will on-demand provide a collector
once told whether the docs will be in order?  Basically, "curry" all
the other details about the collector (sorting by score vs field, if
by field whether to track scores & max score).  Then inside doSearch
when we finally know if the Scorer will be in-order, we ask that
helper class for the collector?  The first time the helper class is
called, it makes the collector; subsequent times it returns the same
one.

There is a risk, though, if the Scorer returned for a given segment
"changes its mind"... eg the first segment's scorer says the docs will
be in order, and then some later segment's scorer says they will not
be in order.  So... that's risky.

Maybe we leave it on QueryWeight, but fix BooleanWeight to return
exactly the right thing?  (It can be exact, right?  Because we know
the conditions under which BooleanWeight, if allowed to do so, would
choose to return an out-of-order scorer).

{quote}
bq. Shouldn't Searchable cutover to QueryWeight too? (We are keeping 
Searchable, but allowing changes to it)

I wrote that above too - I don't think we can declare and execute right in 2.9 
that Searchable can be changed unexpectedly. So I added a NOTE to its javadocs 
and thought to do the change post 2.9, when we remove Weight. We'd be forced to 
change these methods to QueryWeight, and fix RemoteSearchable too. And it will 
be consistent w/ our back-compat policy (at least the part where we declare on 
an upcoming change before it happens).

But if you think otherwise, I don't mind deprecating and adding new methods 
(I've got used to it already, I almost do it blindly  ).
{quote}

[Sorry, I'm losing track of all the comments]

OK let's defer the changes to Searchable until 3.1.  Make sure you
open a follow-on issue so we remember ;)


> Mating Collector and Scorer on doc Id orderness
> -----------------------------------------------
>
>                 Key: LUCENE-1630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1630
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>            Assignee: Michael McCandless
>             Fix For: 2.9
>
>         Attachments: LUCENE-1630.patch, LUCENE-1630.patch, LUCENE-1630.patch
>
>
> This is a spin off of LUCENE-1593. This issue proposes to expose appropriate 
> API on Scorer and Collector such that one can create an optimized Collector 
> based on a given Scorer's doc-id orderness and vice versa. Copied from 
> LUCENE-1593, here is the list of changes:
> # Deprecate Weight and create QueryWeight (abstract class) with a new 
> scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) 
> method. QueryWeight implements Weight, while score(reader) calls 
> score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) 
> is defined abstract.
> #* Also add QueryWeightWrapper to wrap a given Weight implementation. This 
> one will also be deprecated, as well as package-private.
> #* Add to Query variants of createWeight and weight which return QueryWeight. 
> For now, I prefer to add a default impl which wraps the Weight variant 
> instead of overriding in all Query extensions, and in 3.0 when we remove the 
> Weight variants - override in all extending classes.
> # Add to Scorer isOutOfOrder with a default to false, and override in BS to 
> true.
> # Modify BooleanWeight to extend QueryWeight and implement the new scorer 
> method to return BS2 or BS based on the number of required scorers and 
> setAllowOutOfOrder.
> # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns 
> true/false.
> #* Use it in IndexSearcher.search methods, that accept a Collector, in order 
> to create the appropriate Scorer, using the new QueryWeight.
> #* Provide a static create method to TFC and TSDC which accept this as an 
> argument and creates the proper instance.
> #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order 
> Scorer and check on the resulting Scorer isOutOfOrder(), so that we can 
> create the optimized Collector instance.
> # Modify IndexSearcher to use all of the above logic.
> The only class I'm worried about, and would like to verify with you, is 
> Searchable. If we want to deprecate all the search methods on IndexSearcher, 
> Searcher and Searchable which accept Weight and add new ones which accept 
> QueryWeight, we must do the following:
> * Deprecate Searchable in favor of Searcher.
> * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) 
> break back-compat and add them as abstract (like we've done with the new 
> Collector method) or (2) add them with a default impl to call the Weight 
> versions, documenting these will become abstract in 3.0.
> * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend 
> Searcher. That's the part I'm a little bit worried about - Searchable 
> implements java.rmi.Remote, which means there could be an implementation out 
> there which implements Searchable and extends something different than 
> UnicastRemoteObject, like Activeable. I think there is very small chance this 
> has actually happened, but would like to confirm with you guys first.
> * Add a deprecated, package-private, SearchableWrapper which extends Searcher 
> and delegates all calls to the Searchable member.
> * Deprecate all uses of Searchable and add Searcher instead, defaulting the 
> old ones to use SearchableWrapper.
> * Make all the necessary changes to IndexSearcher, MultiSearcher etc. 
> regarding overriding these new methods.
> One other optimization that was discussed in LUCENE-1593 is to expose a 
> topScorer() API (on Weight) which returns a Scorer that its score(Collector) 
> will be called, and additionally add a start() method to DISI. That will 
> allow Scorers to initialize either on start() or score(Collector). This was 
> proposed mainly because of BS and BS2 which check if they are initialized in 
> every call to next(), skipTo() and score(). Personally I prefer to see that 
> in a separate issue, following that one (as it might add methods to 
> QueryWeight).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Reply via email to