Re: SortingMergePolicy for already sorted segments

Shai Erera Tue, 17 Jun 2014 05:13:29 -0700

>
> I am afraid the DocMap still maintains doc-id mappings till merge and I am
> trying to avoid it...
>


What do you mean 'till merge'? The method OneMerge.getMergeReaders() is
called only when the merge is executed, not when the MergePolicy decided to
merge those segments. Therefore the DocMap is initialized only when the
merge actually executes ... what is there more to postpone?

And besides, if the segments are already sorted, you should return a null
DocMap, like Lucene code does ...

If I miss your point, I'd appreciate if you can point me to a code example,
preferably in Lucene source, which demonstrates the problem.

Shai


On Tue, Jun 17, 2014 at 3:03 PM, Ravikumar Govindarajan <
[email protected]> wrote:

> I am afraid the DocMap still maintains doc-id mappings till merge and I am
> trying to avoid it...
>
> I think lucene itself has a MergeIterator in o.a.l.util package.
>
> A MergePolicy can wrap a simple MergeIterator for iterating docs across
> different AtomicReaders in correct sort-order for a given field/term
>
> That should be fine right?
>
> --
> Ravi
>
> --
> Ravi
>
>
> On Tue, Jun 17, 2014 at 1:24 PM, Shai Erera <[email protected]> wrote:
>
> > loadSortTerm is your method right? In the current Sorter.sort
> > implementation, I see this code:
> >
> >     boolean sorted = true;
> >     for (int i = 1; i < maxDoc; ++i) {
> >       if (comparator.compare(i-1, i) > 0) {
> >         sorted = false;
> >         break;
> >       }
> >     }
> >     if (sorted) {
> >       return null;
> >     }
> >
> > Perhaps you can write similar code?
> >
> > Also note that the sorting interface has changed, I think in 4.8, and now
> > you don't really need to implement a Sorter, but rather pass a SortField,
> > if that works for you.
> >
> > Shai
> >
> >
> > On Tue, Jun 17, 2014 at 9:41 AM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > Shai,
> > >
> > > This is the code snippet I use inside my class...
> > >
> > > public class MySorter extends Sorter {
> > >
> > > @Override
> > >
> > > public DocMap sort(AtomicReader reader) throws IOException {
> > >
> > >   final Map<Integer, BytesRef> docVsId = loadSortTerm(reader);
> > >
> > >   final Sorter.DocComparator comparator = new Sorter.DocComparator() {
> > >
> > >   @Override
> > >
> > >    public int compare(int docID1, int docID2) {
> > >
> > >       BytesRef v1 = docVsId.get(docID1);
> > >
> > >       BytesRef v2 = docVsId.get(docID2);
> > >
> > >        return v1.compareTo(v2);
> > >
> > >    }
> > >
> > >  };
> > >
> > >  return sort(reader.maxDoc(), comparator);
> > >
> > > }
> > > }
> > >
> > > My Problem is, the "AtomicReader" passed to Sorter.sort method is
> > actually
> > > a SlowCompositeReader, composed of a list of AtomicReaders each of
> which
> > is
> > > already sorted.
> > >
> > > I find this "loadSortTerm(compositeReader)" to be a bit heavy where it
> > > tries to all load the doc-to-term mappings eagerly...
> > >
> > > Are there some alternatives for this?
> > >
> > > --
> > > Ravi
> > >
> > >
> > > On Tue, Jun 17, 2014 at 10:58 AM, Shai Erera <[email protected]> wrote:
> > >
> > > > I'm not sure that I follow ... where do you see DocMap being loaded
> up
> > > > front? Specifically, Sorter.sort may return null of the readers are
> > > already
> > > > sorted ... I think we already optimized for the case where the
> readers
> > > are
> > > > sorted.
> > > >
> > > > Shai
> > > >
> > > >
> > > > On Tue, Jun 17, 2014 at 4:04 AM, Ravikumar Govindarajan <
> > > > [email protected]> wrote:
> > > >
> > > > > I am planning to use SortingMergePolicy where all the
> > > merge-participating
> > > > > segments are already sorted... I understand that I need to define a
> > > > DocMap
> > > > > with old-new doc-id mappings.
> > > > >
> > > > > Is it possible to optimize the eager loading of DocMap and make it
> > kind
> > > > of
> > > > > lazy load on-demand?
> > > > >
> > > > > Ex: Pass List<AtomicReader> to the caller and ask for next new-old
> > doc
> > > > > mapping..
> > > > >
> > > > > Since my segments are already sorted, I could save on memory a
> > > little-bit
> > > > > this way, instead of loading the full DocMap upfront
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > >
> > >
> >
>

Re: SortingMergePolicy for already sorted segments

Reply via email to