Have you considered leveraging Lucene's built-in index sorting? It supports concurrent indexing and is quite fast.
On Fri, Dec 18, 2020 at 7:26 PM Haoyu Zhai <zhai7...@gmail.com> wrote: > Hi > Our team is seeking a way of construct (or rebuild) a deterministic sorted > index concurrently (I know lucene could achieve that in a sequential manner > but that might be too slow for us sometimes) > Currently we have roughly 2 ideas, all assuming there's a pre-built index > and have dumped a doc-segment map so that IndexWriter would be able to be > aware of which doc belong to which segment: > 1. First build index in the normal way (concurrently), after the index is > built, using "addIndexes" functionality to merge documents into the correct > segment. > 2. By controlling FlushPolicy and other related classes, make sure each > segment created (before merge) has only the documents that belong to one of > the segments in the pre-built index. And create a dedicated MergePolicy to > only merge segments belonging to one pre-built segment. > > Basically we think first one is easier to implement and second one is > faster. Want to seek some ideas & suggestions & feedback here. > > Thanks > Patrick Zhai > -- Adrien