Thanks Patrick for the help! May I know what lucene version you're using? >
We are using an older version of lucene as of now (4.7.x) and I believe the FilterCodecReader of current version is akin to FilterAtomicReader & should do the job for us! If it is not available, I'm not sure whether the merge will happen via merge > policy, maybe you could check the source code and see? > Checked & AFAIK, our old version isn't supporting it. But I guess it should be fine to wrap a SortingAtomicReader and pass it to the API. Guess, it can be done! But I think the current default directory implementation is MMapDirectory, > which delegate the caching to the system and should have > already optimized this situation > We do use the default MMap-dir but I was actually thinking about unpacking/walking Term-Dict data (FST) repeatedly from various threads, even if via MMap. Are there optimizations here (caching unpacked blocks etc..) that we could tap into? -- Ravi On Mon, May 24, 2021 at 11:09 PM Patrick Zhai <zhai7...@gmail.com> wrote: > Hi Ravi, > > 1. May I know what lucene version you're using? As far as I know the > SortingMergePolicy has been deprecated and replaced by > IndexWriterConfig.setIndexSort in newer lucene version. So if the > "setIndexSort" is available I would suggest using that to achieve the > sorted index (as you might have already figured out, the IndexRearranger > let you pass in an IndexWriterConfig so that you could set it there). If it > is not available, I'm not sure whether the merge will happen via merge > policy, maybe you could check the source code and see? > 2. Yeah it's a good observation, we're doing multiple passes over one > segment! But I think the current default directory implementation is > MMapDirectory, which delegate the caching to the system and should have > already optimized this situation. Here's a great blog explaining the > MMapDirectory in lucene: > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best > Patrick > > Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> 于2021年5月24日周一 > 上午9:54写道: > > > Thanks Michael! > > > > This was just what I was looking for!!. Just a couple of questions. > > > > > > - When we call addIndexes(IndexReader...), does the merge happen via > > MergePolicy? We use a SortingMergePolicy and would like to maintain > the > > sort-order in newly created segments too > > - Concurrency is a cool-trick here. But if I understand the patch > > correctly, don't we end-up doing multiple passes over the Term Dict, > one > > for each Selector? Loading it fully in memory could help here, > possibly? > > > > -- > > Ravi > > > > On Mon, May 24, 2021 at 7:37 PM Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > > > Are you trying to rewrite your already created index into a different > > > segment geometry? > > > > > > Maybe have a look at the new IndexRearranger tool > > > <https://issues.apache.org/jira/browse/LUCENE-9694>? It is already > > doing > > > something like what you enumerated below, including mocking LiveDocs to > > get > > > the right documents into the right segments. > > > > > > Mike McCandless > > > > > > http://blog.mikemccandless.com > > > > > > > > > On Sat, May 22, 2021 at 3:50 PM Ravikumar Govindarajan < > > > ravikumar.govindara...@gmail.com> wrote: > > > > > >> Hello, > > >> > > >> We have a use-case for index-rewrite on a "frozen index" where no new > > >> documents are added. It goes like this.. > > >> > > >> 1. Get all segments for the index (base-segment-list) > > >> 2. Create a new segment from base-segment-list with unique set of > > docs > > >> (LiveDocs) > > >> 3. Repeat step 2, for a fixed count. Like say 5 or 10 times > > >> > > >> Is something like this achievable via Merge Policy? We can disable > > commits > > >> too, till the full run is completed. > > >> > > >> Any help is appreciated > > >> > > >> Regards, > > >> Ravi > > >> > > > > > >