[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Mark Miller (JIRA) Fri, 21 Aug 2009 07:48:40 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745997#action_12745997
 ]


Mark Miller commented on LUCENE-1821:
-------------------------------------

Its org.apache.lucene.search.QueryWrapperFilter. And technically we have to 
account for subclasses and combinations and anything possible.

This being the release of the break, who knows though. I can't reasonably see 
releasing without notes indicating that you *must* recompile. And while we want 
to limit how much work must be done (we also consider the likely impact), this 
would be the time to skirt through.

Pretty much depends on what McCandless weighs in with I guess - unless a new 
spectator pops up.
{code}
getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.QueryWrapperFilter
        bits(IndexReader) : BitSet - 
org.apache.lucene.search.BooleanFilterTest.getOldBitSetFilter(...).new Filter() 
{...}
        ConstantScorer(Similarity, IndexReader, Weight) - 
org.apache.lucene.search.ConstantScoreQuery.ConstantScorer
        explain(Searcher, IndexReader, int) : Explanation - 
org.apache.lucene.search.FilteredQuery.createWeight(...).new Weight() {...}
        getDISI(ArrayList, int, IndexReader) : DocIdSetIterator - 
org.apache.lucene.search.BooleanFilter
        getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.BooleanFilter (3 matches)
        getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.CachingWrapperFilter
        getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.CachingWrapperFilterHelper
        getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.RemoteCachingWrapperFilter
        getDocIdSet(IndexReader) : DocIdSet - 
org.apache.lucene.search.RemoteCachingWrapperFilterHelper
        scorer(IndexReader, boolean, boolean) : Scorer - 
org.apache.lucene.search.FilteredQuery.createWeight(...).new Weight() {...}
        searchWithFilter(IndexReader, Weight, Filter, Collector) : void - 
org.apache.lucene.search.IndexSearcher
        tstFilterCard(String, int, Filter) : void - 
org.apache.lucene.search.BooleanFilterTest
{code}

> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>             Fix For: 2.9
>
>         Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader)iter.next();
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Reply via email to