Re: Is there a way to customize segment names?

Uwe Schindler Sat, 17 Dec 2022 03:13:58 -0800

Hi,

have you thought about storing additional metadata in the commits? Ifyou want to have custom information in each commit, just save someinternal tracking identifiers so you can figure out if node A or node Bis primary by checking their latest commit metadata.

Generally I do not understand your request: do you want to give segmentssome completely custom filenames like setters? This is impossible to do.If you just want another "algorithm" to generate new segment names (itis actually some base32-like ID) you can patch Lucene, but this wouldnot solve your problem.


Uwe

Am 17.12.2022 um 01:28 schrieb Patrick Zhai:

Hi Mike, Robert

Thanks for replying, the system is almost like what Mike hasdescribed: one writer is primary,and the other is trying to catch up and wait, but in our internaldiscussion we found there mightbe small chances where the secondary mistakenly think itself asprimary (due to errors of other component)

while primary is still alive and thus goes into the situation I described.

And because we want to tolerate the error in case we can't prevent itfrom happening, we're looking for customizing

filenames.

Thanks again for discussing this with me and I've learnt that playingwith filenames can become quitetroublesome, but still, even out of my own curiosity, I want tounderstand whether we're able to control

the segment names in some way?

Best
Patrick

On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov <[email protected]>wrote:


    +1 trying to coordinate multiple writers running independently will
    not work. My 2c for availability: you can have a single primary active
    writer with a backup one waiting, receiving all the segments from the
    primary. Then if the primary goes down, the secondary one has the most
    recent commit replicated from the primary (identical commit, same
    segments etc) and can pick up from there. You would need a mechanism
    to replay the writes the primary never had a chance to commit.

    On Fri, Dec 16, 2022 at 5:41 AM Robert Muir <[email protected]> wrote:
    >
    > You are still talking "Multiple writers". Like i said, going
    down this
    > path (playing tricks with filenames) isn't going to work out well.
    >
    > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai
    <[email protected]> wrote:
    > >
    > > Hi Robert,
    > >
    > > Maybe I didn't explain it clearly but we're not going to
    constantly switch
    > > between writers or share effort between writers, it's purely for
    > > availability: the second writer only kicks in when the first
    writer is not
    > > available for some reason.
    > > And as far as I know the replicator/nrt module has not
    provided a solution
    > > on when the primary node (main indexer) is down, how would we
    recover with
    > > a back up indexer?
    > >
    > > Thanks
    > > Patrick
    > >
    > >
    > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir <[email protected]>
    wrote:
    > >
    > > > This multiple-writer isn't going to work and customizing
    names won't
    > > > allow it anyway. Each file also contains a unique identifier
    tied to
    > > > its commit so that we know everything is intact.
    > > >
    > > > I would look at the segment replication in lucene/replicator
    and not
    > > > try to play games with files and mixing multiple writers.
    > > >
    > > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai
    <[email protected]> wrote:
    > > > >
    > > > > Hi Folks,
    > > > >
    > > > > We're trying to build a search architecture using segment
    replication
    > > > (indexer and searcher are separated and indexer shipping new
    segments to
    > > > searchers) right now and one of the problems we're facing
    is: for
    > > > availability reason we need to have multiple indexers
    running, and when the
    > > > searcher is switching from consuming one indexer to another,
    there are
    > > > chances where the segment names collide with each other
    (because segment
    > > > names are count based) and the searcher have to reload the
    whole index.
    > > > > To avoid that we're looking for a way to name the segments
    so that
    > > > Lucene is able to tell the difference and load only the
    difference (by
    > > > calling `openIfChanged`). I've checked the IndexWriter and the
    > > > DocumentsWriter and it seems it is controlled by a private
    final method
    > > > `newSegmentName()` so likely not possible there. So I wonder
    whether
    > > > there's any other ways people are aware of that can help
    control the
    > > > segment names?
    > > > >
    > > > > A example of the situation described above:
    > > > > Searcher previously consuming from indexer 1, and have
    following
    > > > segments: _1, _2, _3, _4
    > > > > Indexer 2 previously sync'd from indexer 1, sharing the
    first 3
    > > > segments, and produced its own 4th segments (notioned as
    _4', but it shares
    > > > the same "_4" name): _1, _2, _3, _4'
    > > > > Suddenly Indexer 1 dies and searcher switched from Indexer
    1 to Indexer
    > > > 2, then when it finished downloading the segments and trying
    to refresh the
    > > > reader, it will likely hit the exception here, and seems all
    we can do
    > > > right now is to reload the whole index and that could be
    potentially a high
    > > > cost.
    > > > >
    > > > > Sorry for the long email and thank you in advance for any
    replies!
    > > > >
    > > > > Best
    > > > > Patrick
    > > > >
    > > >
    > > >
    ---------------------------------------------------------------------
    > > > To unsubscribe, e-mail: [email protected]
    > > > For additional commands, e-mail: [email protected]
    > > >
    > > >
    >
    >
    ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [email protected]
    > For additional commands, e-mail: [email protected]
    >

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [email protected]
    For additional commands, e-mail: [email protected]

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:[email protected]

Re: Is there a way to customize segment names?

Reply via email to