I don't know about addIndexes. Does that let you say which document goes
where somehow? Wouldn't you have to select a subset of documents from each
originally indexed segment?

On Sat, Dec 19, 2020, 12:11 PM Michael Sokolov <[email protected]> wrote:

> I think the idea is to exert control over the distribution of documents
> among the segments, in a deterministic reproducible way.
>
> On Sat, Dec 19, 2020, 11:39 AM Adrien Grand <[email protected]> wrote:
>
>> Have you considered leveraging Lucene's built-in index sorting? It
>> supports concurrent indexing and is quite fast.
>>
>> On Fri, Dec 18, 2020 at 7:26 PM Haoyu Zhai <[email protected]> wrote:
>>
>>> Hi
>>> Our team is seeking a way of construct (or rebuild) a deterministic
>>> sorted index concurrently (I know lucene could achieve that in a sequential
>>> manner but that might be too slow for us sometimes)
>>> Currently we have roughly 2 ideas, all assuming there's a pre-built
>>> index and have dumped a doc-segment map so that IndexWriter would be able
>>> to be aware of which doc belong to which segment:
>>> 1. First build index in the normal way (concurrently), after the index
>>> is built, using "addIndexes" functionality to merge documents into the
>>> correct segment.
>>> 2. By controlling FlushPolicy and other related classes, make sure each
>>> segment created (before merge) has only the documents that belong to one of
>>> the segments in the pre-built index. And create a dedicated MergePolicy to
>>> only merge segments belonging to one pre-built segment.
>>>
>>> Basically we think first one is easier to implement and second one is
>>> faster. Want to seek some ideas & suggestions & feedback here.
>>>
>>> Thanks
>>> Patrick Zhai
>>>
>>
>>
>> --
>> Adrien
>>
>

Reply via email to