Have you considered leveraging Lucene's built-in index sorting? It supports
concurrent indexing and is quite fast.

On Fri, Dec 18, 2020 at 7:26 PM Haoyu Zhai <zhai7...@gmail.com> wrote:

> Hi
> Our team is seeking a way of construct (or rebuild) a deterministic sorted
> index concurrently (I know lucene could achieve that in a sequential manner
> but that might be too slow for us sometimes)
> Currently we have roughly 2 ideas, all assuming there's a pre-built index
> and have dumped a doc-segment map so that IndexWriter would be able to be
> aware of which doc belong to which segment:
> 1. First build index in the normal way (concurrently), after the index is
> built, using "addIndexes" functionality to merge documents into the correct
> segment.
> 2. By controlling FlushPolicy and other related classes, make sure each
> segment created (before merge) has only the documents that belong to one of
> the segments in the pre-built index. And create a dedicated MergePolicy to
> only merge segments belonging to one pre-built segment.
>
> Basically we think first one is easier to implement and second one is
> faster. Want to seek some ideas & suggestions & feedback here.
>
> Thanks
> Patrick Zhai
>


-- 
Adrien

Reply via email to