Re: Use of AllTermDocs with custom scorer

Peter Keegan Mon, 16 Nov 2009 15:39:08 -0800

>Can you remap your external data to be per segment?

That would provide the tightest integration but would require a major
redesign. Currently, the external data is in a single file created by
reading a stored field after the Lucene index has been committed. Creating
this file is very fast with 2.9 (considering the cost of reading all those
stored fields).


>For your custom sort comparator, are you using FieldComparator?

I'm using the deprecated FieldSortedHitQueue. I started looking into
replacing it with FieldComparator, but it was much more involved than I had
expected, so I postponed. Also, this would only be a partial solution to a
query with a custom scorer and custom sorter.

>Failing these, Lucene currently visits the readers in index order.
>So, you could accumulate the docBase by adding up the reader.maxDoc()
>for each reader you've seen.  However, this may change in future
>Lucene releases.

This would work for the Scorer but not the Sorter, right?

>You could also, externally, build your own map from SegmentReader ->
>docBase, by calling IndexReader.getSequentialSubReaders() and stepping
>through adding up the maxDoc.  Then, in your search, you can lookup
>the SegmentReader you're working on to get the docBase?

I think this would work for both Scorer and Sorter, right?
This seems like the best solution right now.

Thanks for good suggestions!

Peter

On Mon, Nov 16, 2009 at 5:16 PM, Michael McCandless <
[email protected]> wrote:

> Can you remap your external data to be per segment?  Presumably hat
> would make reopens faster for your app.
>
> For your custom sort comparator, are you using FieldComparator?  If
> so, Lucene calls setNextReader to tell you the reader & docBase.
>
> Failing these, Lucene currently visits the readers in index order.
> So, you could accumulate the docBase by adding up the reader.maxDoc()
> for each reader you've seen.  However, this may change in future
> Lucene releases.
>
> You could also, externally, build your own map from SegmentReader ->
> docBase, by calling IndexReader.getSequentialSubReaders() and stepping
> through adding up the maxDoc.  Then, in your search, you can lookup
> the SegmentReader you're working on to get the docBase?
>
> Mike
>
> On Mon, Nov 16, 2009 at 2:50 PM, Peter Keegan <[email protected]>
> wrote:
> > The same thing is occurring in my custom sort comparator. The ScoreDocs
> > passed to the 'compare' method have docIds that seem to be relative to
> the
> > segment. Is there any way to translate these into index-wide docIds?
> >
> > Peter
> >
> > On Mon, Nov 16, 2009 at 2:06 PM, Peter Keegan <[email protected]
> >wrote:
> >
> >> I forgot to mention that this is with V2.9.1
> >>
> >>
> >> On Mon, Nov 16, 2009 at 1:39 PM, Peter Keegan <[email protected]
> >wrote:
> >>
> >>> I have a custom query object whose scorer uses the 'AllTermDocs' to get
> >>> all non-deleted documents. AllTermDocs returns the docId relative to
> the
> >>> segment, but I need the absolute (index-wide) docId to access external
> data.
> >>> What's the best way to get the unique, non-deleted docId?
> >>>
> >>> Thanks,
> >>> Peter
> >>>
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Use of AllTermDocs with custom scorer

Reply via email to