Hi Ravi,

1. May I know what lucene version you're using? As far as I know the
SortingMergePolicy has been deprecated and replaced by
IndexWriterConfig.setIndexSort in newer lucene version. So if the
"setIndexSort" is available I would suggest using that to achieve the
sorted index (as you might have already figured out, the IndexRearranger
let you pass in an IndexWriterConfig so that you could set it there). If it
is not available, I'm not sure whether the merge will happen via merge
policy, maybe you could check the source code and see?
2. Yeah it's a good observation, we're doing multiple passes over one
segment! But I think the current default directory implementation is
MMapDirectory, which delegate the caching to the system and should have
already optimized this situation. Here's a great blog explaining the
MMapDirectory in lucene:
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Best
Patrick

Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> 于2021年5月24日周一
上午9:54写道:

> Thanks Michael!
>
> This was just what I was looking for!!. Just a couple of questions.
>
>
>    - When we call addIndexes(IndexReader...), does the merge happen via
>    MergePolicy? We use a SortingMergePolicy and would like to maintain the
>    sort-order in newly created segments too
>    - Concurrency is a cool-trick here. But if I understand the patch
>    correctly, don't we end-up doing multiple passes over the Term Dict, one
>    for each Selector? Loading it fully in memory could help here, possibly?
>
> --
> Ravi
>
> On Mon, May 24, 2021 at 7:37 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
> > Are you trying to rewrite your already created index into a different
> > segment geometry?
> >
> > Maybe have a look at the new IndexRearranger tool
> > <https://issues.apache.org/jira/browse/LUCENE-9694>?  It is already
> doing
> > something like what you enumerated below, including mocking LiveDocs to
> get
> > the right documents into the right segments.
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Sat, May 22, 2021 at 3:50 PM Ravikumar Govindarajan <
> > ravikumar.govindara...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> We have a use-case for index-rewrite on a "frozen index" where no new
> >> documents are added. It goes like this..
> >>
> >>    1. Get all segments for the index (base-segment-list)
> >>    2. Create a new segment from base-segment-list with unique set of
> docs
> >>    (LiveDocs)
> >>    3. Repeat step 2, for a fixed count. Like say 5 or 10 times
> >>
> >> Is something like this achievable via Merge Policy? We can disable
> commits
> >> too, till the full run is completed.
> >>
> >> Any help is appreciated
> >>
> >> Regards,
> >> Ravi
> >>
> >
>

Reply via email to