[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Tim Smith (JIRA) Thu, 20 Aug 2009 06:07:43 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745436#action_12745436
 ]


Tim Smith commented on LUCENE-1821:
-----------------------------------

true, MultiSearcher does kink things up some (and the Searcher abstract class 
in general)

personally, this is not a problem for me (don't use MultiSearcher (not yet at 
least)), and i'm happy with being passed the IndexSearcher instance that 
directly contains the IndexReader i'm being passed

The contract could be marked that the Searcher provided is the direct container 
of the IndexReader also passed
at which point, both explain() and scorer() would be "accurate" in terms of this

I would almost like to see something different passed in instead of a 
Searcher/IndexReader pair

i would actually like to see a "SearchContext" sort of object passed in
this would represent the whole "tree" of Searchers/IndexReaders
this would allow access to the MultiSearcher, the direct IndexSearcher, and the 
sub IndexReader (which should actually be used for the scoring) (as well as any 
other Searcher's in the call stack) 
this SearchContext could also pass in the "topScorer/allowDocsInOrder" flags 
(but that would be more difficult as scorers have subscorers that need to 
sometimes be created with different flags for these), but this SearchContext 
could be used to pass more information throughout the Scorer API in general 
from the top level (like - always use constant score queries where possible, 
use scoring algorithm X, Y, or Z, and so on)

obviously this would impact the API of Searcher a good deal as it would have to 
maintain this stack as sub Searcher's search() methods are called)

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>         Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader)iter.next();
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Reply via email to