Hi Ravi, 1. May I know what lucene version you're using? As far as I know the SortingMergePolicy has been deprecated and replaced by IndexWriterConfig.setIndexSort in newer lucene version. So if the "setIndexSort" is available I would suggest using that to achieve the sorted index (as you might have already figured out, the IndexRearranger let you pass in an IndexWriterConfig so that you could set it there). If it is not available, I'm not sure whether the merge will happen via merge policy, maybe you could check the source code and see? 2. Yeah it's a good observation, we're doing multiple passes over one segment! But I think the current default directory implementation is MMapDirectory, which delegate the caching to the system and should have already optimized this situation. Here's a great blog explaining the MMapDirectory in lucene: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
Best Patrick Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> 于2021年5月24日周一 上午9:54写道: > Thanks Michael! > > This was just what I was looking for!!. Just a couple of questions. > > > - When we call addIndexes(IndexReader...), does the merge happen via > MergePolicy? We use a SortingMergePolicy and would like to maintain the > sort-order in newly created segments too > - Concurrency is a cool-trick here. But if I understand the patch > correctly, don't we end-up doing multiple passes over the Term Dict, one > for each Selector? Loading it fully in memory could help here, possibly? > > -- > Ravi > > On Mon, May 24, 2021 at 7:37 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > > > Are you trying to rewrite your already created index into a different > > segment geometry? > > > > Maybe have a look at the new IndexRearranger tool > > <https://issues.apache.org/jira/browse/LUCENE-9694>? It is already > doing > > something like what you enumerated below, including mocking LiveDocs to > get > > the right documents into the right segments. > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Sat, May 22, 2021 at 3:50 PM Ravikumar Govindarajan < > > ravikumar.govindara...@gmail.com> wrote: > > > >> Hello, > >> > >> We have a use-case for index-rewrite on a "frozen index" where no new > >> documents are added. It goes like this.. > >> > >> 1. Get all segments for the index (base-segment-list) > >> 2. Create a new segment from base-segment-list with unique set of > docs > >> (LiveDocs) > >> 3. Repeat step 2, for a fixed count. Like say 5 or 10 times > >> > >> Is something like this achievable via Merge Policy? We can disable > commits > >> too, till the full run is completed. > >> > >> Any help is appreciated > >> > >> Regards, > >> Ravi > >> > > >