[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Shai Erera (JIRA) Mon, 15 Jun 2009 05:15:36 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719535#action_12719535
 ]


Shai Erera commented on LUCENE-1630:
------------------------------------

Ok I was just about to post the patch, when the Spatial tests failed. After 
some investigation, I found out the following, and would appreciate your 
suggestions. IndexSearcher.search(QueryWeight weight, Filter filter, final int 
nDocs, Sort sort, boolean fillFields) I wrote the following code:

{code}
    // try to create a Scorer in out-of-order mode, just to know which TFC
    // version to instantiate.
    boolean docsScoredInOrder = false;
    if (subReaders.length > 0) {
      docsScoredInOrder = !weight.scorer(subReaders[0], false, 
false).scoresOutOfOrder();
    }
    TopFieldCollector collector = TopFieldCollector.create(sort, nDocs,
        fillFields, fieldSortDoTrackScores, fieldSortDoMaxScore, 
docsScoredInOrder);
    search(weight, filter, collector);
{code}

For clarification - I need to know which TFC instance to create (in-order / 
out-of-order). For that, I need to first create a Scorer, asking for 
out-of-order one, and then check whether the Scorer is indeed an out-of-order 
or not. That's a dummy Scorer, as I never use it afterwards, but since we 
didn't want to add scoresOutOfOrder to Weight, but Scorer, I don't have any 
other choice.

For Spatial, this creates a problem. One of the tests uses ConstantScoreQuery 
and passes in a Filter. CSQ.scorer() creates a new Scorer and uses the given 
Filter as reference. In Spatial, every time Filter.getDocIdSet() is called, the 
internal filter populates a WeakHashMap of distances (with the doc id as key), 
and doesn't clear it between invocations. It also updates the "base" of the key 
to handle multiple readers. Therefore the docs of the first reader are added 
twice - once for the dummy invocation and the second time since the "base" is 
updated (LatLongDistanceFilter.java, line 222) to reader.maxDoc().

I tried to create a new "distances" map on every invocation, but then another 
test fails. I don't know this code very well, and I don't know which is the 
best solution:

* Complicate the code in IndexSearcher to create a Scorer, then collect it and 
then proceed w/ iterating on the readers, from the 2nd forward. This is a real 
ugly change, I tried it and quickly reverted. It also breaks the current beauty 
of having all the search methods call search(Weight, Filter, Collector).

* Fix LatLongDistanceFilter code to check if reader.maxDoc() == nextOffset, 
then do nextOffset -= reader.maxDoc(). This is not pretty either, since it 
assumes a certain implementation and use of it, which I don't like either.

* Add scoresOutOfOrder to Weight, but I don't know if we want to add this 
"knowledge" to Weight, and it fits nicely in Scorer.

Any suggestions? perhaps a different fix to Spatial?

> Mating Collector and Scorer on doc Id orderness
> -----------------------------------------------
>
>                 Key: LUCENE-1630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1630
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>
> This is a spin off of LUCENE-1593. This issue proposes to expose appropriate 
> API on Scorer and Collector such that one can create an optimized Collector 
> based on a given Scorer's doc-id orderness and vice versa. Copied from 
> LUCENE-1593, here is the list of changes:
> # Deprecate Weight and create QueryWeight (abstract class) with a new 
> scorer(reader, scoreDocsInOrder), replacing the current scorer(reader) 
> method. QueryWeight implements Weight, while score(reader) calls 
> score(reader, false /* out-of-order */) and scorer(reader, scoreDocsInOrder) 
> is defined abstract.
> #* Also add QueryWeightWrapper to wrap a given Weight implementation. This 
> one will also be deprecated, as well as package-private.
> #* Add to Query variants of createWeight and weight which return QueryWeight. 
> For now, I prefer to add a default impl which wraps the Weight variant 
> instead of overriding in all Query extensions, and in 3.0 when we remove the 
> Weight variants - override in all extending classes.
> # Add to Scorer isOutOfOrder with a default to false, and override in BS to 
> true.
> # Modify BooleanWeight to extend QueryWeight and implement the new scorer 
> method to return BS2 or BS based on the number of required scorers and 
> setAllowOutOfOrder.
> # Add to Collector an abstract _acceptsDocsOutOfOrder_ which returns 
> true/false.
> #* Use it in IndexSearcher.search methods, that accept a Collector, in order 
> to create the appropriate Scorer, using the new QueryWeight.
> #* Provide a static create method to TFC and TSDC which accept this as an 
> argument and creates the proper instance.
> #* Wherever we create a Collector (TSDC or TFC), always ask for out-of-order 
> Scorer and check on the resulting Scorer isOutOfOrder(), so that we can 
> create the optimized Collector instance.
> # Modify IndexSearcher to use all of the above logic.
> The only class I'm worried about, and would like to verify with you, is 
> Searchable. If we want to deprecate all the search methods on IndexSearcher, 
> Searcher and Searchable which accept Weight and add new ones which accept 
> QueryWeight, we must do the following:
> * Deprecate Searchable in favor of Searcher.
> * Add to Searcher the new QueryWeight variants. Here we have two choices: (1) 
> break back-compat and add them as abstract (like we've done with the new 
> Collector method) or (2) add them with a default impl to call the Weight 
> versions, documenting these will become abstract in 3.0.
> * Have Searcher extend UnicastRemoteObject and have RemoteSearchable extend 
> Searcher. That's the part I'm a little bit worried about - Searchable 
> implements java.rmi.Remote, which means there could be an implementation out 
> there which implements Searchable and extends something different than 
> UnicastRemoteObject, like Activeable. I think there is very small chance this 
> has actually happened, but would like to confirm with you guys first.
> * Add a deprecated, package-private, SearchableWrapper which extends Searcher 
> and delegates all calls to the Searchable member.
> * Deprecate all uses of Searchable and add Searcher instead, defaulting the 
> old ones to use SearchableWrapper.
> * Make all the necessary changes to IndexSearcher, MultiSearcher etc. 
> regarding overriding these new methods.
> One other optimization that was discussed in LUCENE-1593 is to expose a 
> topScorer() API (on Weight) which returns a Scorer that its score(Collector) 
> will be called, and additionally add a start() method to DISI. That will 
> allow Scorers to initialize either on start() or score(Collector). This was 
> proposed mainly because of BS and BS2 which check if they are initialized in 
> every call to next(), skipTo() and score(). Personally I prefer to see that 
> in a separate issue, following that one (as it might add methods to 
> QueryWeight).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1630) Mating Collector and Scorer on doc Id orderness

Reply via email to