[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Tim Smith (JIRA) Fri, 21 Aug 2009 06:16:44 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745960#action_12745960
 ]


Tim Smith commented on LUCENE-1821:
-----------------------------------

bq. You never officially had the full index context
Officially, i didn't "not" have the full index context either (it was undefined 
at best, but was clear from both lucene code and my use of the API that i did 
have the full index context)

Whenever i do a search, i always explicitly know what context i'm searching in 
(its always an IndexSearcher context)
further, whenever i pass an IndexReader to any method (to create a cache/etc), 
i explicitly know what context i'm dealing with in order to know what the 
docids used mean
as the application developer, i have full control over what i pass into the 
lucene API and where, and know the context of passing that in (javadoc should 
just be fully clear on how what goes in is used (if not already) (i always have 
the option to not use a utility class/method provided by lucene if it does not 
have the proper context semantics i need (and can write my own that does)

bq. The current API would not support this without back compat breaks up the 
wazoo
i kinda see what you mean here, but then how is it ok to pass an IndexReader to 
this method by the same right
it seems like it should be ok to pass the IndexSearcher (the direct context for 
the IndexReader) for the IndexReader in question to Weight.scorer() if its ok 
to pass the IndexReader (the scorer() method's interface was already changed 
between 2.4 and 2.9 (adding allowDocsInOrder and topScorer))

bq. You can pick, but we have to be true to the API or change it (not easy with 
our back compat policies)
be fair, 2.9 has a lot of back compat breaks, both in API and runtime behavior 
(i had tons of compile errors when i dropped 2.9 in, as well as some other 
hacks i had to add in (at least temporarily) in order to get 2.9 to work due to 
run time changes (primarily this per segment search stuff))

I have no problem with back compat breaks in general (only took me about a day 
to absorb 2.9 initially (still working on fully taking advantage of new 
features and getting rid of deprecated class use)) The only requirement i would 
put on a back compat break is that it have a workaround to get back the the 
previous versions behavior (in this case have it possible to remap the docids 
to the "IndexSearcher" context inside the scorer)



> Weight.scorer() not passed doc offset for "sub reader"
> ------------------------------------------------------
>
>                 Key: LUCENE-1821
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1821
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 2.9
>            Reporter: Tim Smith
>             Fix For: 2.9
>
>         Attachments: LUCENE-1821.patch
>
>
> Now that searching is done on a per segment basis, there is no way for a 
> Scorer to know the "actual" doc id for the document's it matches (only the 
> relative doc offset into the segment)
> If using caches in your scorer that are based on the "entire" index (all 
> segments), there is now no way to index into them properly from inside a 
> Scorer because the scorer is not passed the needed offset to calculate the 
> "real" docid
> suggest having Weight.scorer() method also take a integer for the doc offset
> Abstract Weight class should have a constructor that takes this offset as 
> well as a method to get the offset
> All Weights that have "sub" weights must pass this offset down to created 
> "sub" weights
> Details on workaround:
> In order to work around this, you must do the following:
> * Subclass IndexSearcher
> * Add "int getIndexReaderBase(IndexReader)" method to your subclass
> * during Weight creation, the Weight must hold onto a reference to the passed 
> in Searcher (casted to your sub class)
> * during Scorer creation, the Scorer must be passed the result of 
> YourSearcher.getIndexReaderBase(reader)
> * Scorer can now rebase any collected docids using this offset
> Example implementation of getIndexReaderBase():
> {code}
> // NOTE: more efficient implementation can be done if you cache the result if 
> gatherSubReaders in your constructor
> public int getIndexReaderBase(IndexReader reader) {
>   if (reader == getReader()) {
>     return 0;
>   } else {
>     List readers = new ArrayList();
>     gatherSubReaders(readers);
>     Iterator iter = readers.iterator();
>     int maxDoc = 0;
>     while (iter.hasNext()) {
>       IndexReader r = (IndexReader)iter.next();
>       if (r == reader) {
>         return maxDoc;
>       } 
>       maxDoc += r.maxDoc();
>     } 
>   }
>   return -1; // reader not in searcher
> }
> {code}
> Notes:
> * This workaround makes it so you cannot serialize your custom Weight 
> implementation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1821) Weight.scorer() not passed doc offset for "sub reader"

Reply via email to