Hi Robert, got it, thanks!

Hi Uwe, yes we do have a way to detect whether the segment is created by
node A or B even if they share the same name, however, lucene does not
allow such situation (same name but generated by different writer) when
calling `openIfChanged` to try to incrementally load the new index. So what
I want is to attach a prefix (or postfix, anything lol) to the segment
name, say "A_4" and "B_4" so that when DirectoryReader is doing
`openIfChanged` it will proceed without throwing any exception.

On Sat, Dec 17, 2022 at 4:56 AM Robert Muir <rcm...@gmail.com> wrote:

> No, you can't control them. And we must not open up anything to try to
> support this.
>
> On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai <zhai7...@gmail.com> wrote:
> >
> > Hi Mike, Robert
> >
> > Thanks for replying, the system is almost like what Mike has described:
> one writer is primary,
> > and the other is trying to catch up and wait, but in our internal
> discussion we found there might
> > be small chances where the secondary mistakenly think itself as primary
> (due to errors of other component)
> > while primary is still alive and thus goes into the situation I
> described.
> > And because we want to tolerate the error in case we can't prevent it
> from happening, we're looking for customizing
> > filenames.
> >
> > Thanks again for discussing this with me and I've learnt that playing
> with filenames can become quite
> > troublesome, but still, even out of my own curiosity, I want to
> understand whether we're able to control
> > the segment names in some way?
> >
> > Best
> > Patrick
> >
> >
> > On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov <msoko...@gmail.com>
> wrote:
> >>
> >> +1 trying to coordinate multiple writers running independently will
> >> not work. My 2c for availability: you can have a single primary active
> >> writer with a backup one waiting, receiving all the segments from the
> >> primary. Then if the primary goes down, the secondary one has the most
> >> recent commit replicated from the primary (identical commit, same
> >> segments etc) and can pick up from there. You would need a mechanism
> >> to replay the writes the primary never had a chance to commit.
> >>
> >> On Fri, Dec 16, 2022 at 5:41 AM Robert Muir <rcm...@gmail.com> wrote:
> >> >
> >> > You are still talking "Multiple writers". Like i said, going down this
> >> > path (playing tricks with filenames) isn't going to work out well.
> >> >
> >> > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai <zhai7...@gmail.com>
> wrote:
> >> > >
> >> > > Hi Robert,
> >> > >
> >> > > Maybe I didn't explain it clearly but we're not going to constantly
> switch
> >> > > between writers or share effort between writers, it's purely for
> >> > > availability: the second writer only kicks in when the first writer
> is not
> >> > > available for some reason.
> >> > > And as far as I know the replicator/nrt module has not provided a
> solution
> >> > > on when the primary node (main indexer) is down, how would we
> recover with
> >> > > a back up indexer?
> >> > >
> >> > > Thanks
> >> > > Patrick
> >> > >
> >> > >
> >> > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir <rcm...@gmail.com>
> wrote:
> >> > >
> >> > > > This multiple-writer isn't going to work and customizing names
> won't
> >> > > > allow it anyway. Each file also contains a unique identifier tied
> to
> >> > > > its commit so that we know everything is intact.
> >> > > >
> >> > > > I would look at the segment replication in lucene/replicator and
> not
> >> > > > try to play games with files and mixing multiple writers.
> >> > > >
> >> > > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai <zhai7...@gmail.com>
> wrote:
> >> > > > >
> >> > > > > Hi Folks,
> >> > > > >
> >> > > > > We're trying to build a search architecture using segment
> replication
> >> > > > (indexer and searcher are separated and indexer shipping new
> segments to
> >> > > > searchers) right now and one of the problems we're facing is: for
> >> > > > availability reason we need to have multiple indexers running,
> and when the
> >> > > > searcher is switching from consuming one indexer to another,
> there are
> >> > > > chances where the segment names collide with each other (because
> segment
> >> > > > names are count based) and the searcher have to reload the whole
> index.
> >> > > > > To avoid that we're looking for a way to name the segments so
> that
> >> > > > Lucene is able to tell the difference and load only the
> difference (by
> >> > > > calling `openIfChanged`). I've checked the IndexWriter and the
> >> > > > DocumentsWriter and it seems it is controlled by a private final
> method
> >> > > > `newSegmentName()` so likely not possible there. So I wonder
> whether
> >> > > > there's any other ways people are aware of that can help control
> the
> >> > > > segment names?
> >> > > > >
> >> > > > > A example of the situation described above:
> >> > > > > Searcher previously consuming from indexer 1, and have following
> >> > > > segments: _1, _2, _3, _4
> >> > > > > Indexer 2 previously sync'd from indexer 1, sharing the first 3
> >> > > > segments, and produced its own 4th segments (notioned as _4', but
> it shares
> >> > > > the same "_4" name): _1, _2, _3, _4'
> >> > > > > Suddenly Indexer 1 dies and searcher switched from Indexer 1 to
> Indexer
> >> > > > 2, then when it finished downloading the segments and trying to
> refresh the
> >> > > > reader, it will likely hit the exception here, and seems all we
> can do
> >> > > > right now is to reload the whole index and that could be
> potentially a high
> >> > > > cost.
> >> > > > >
> >> > > > > Sorry for the long email and thank you in advance for any
> replies!
> >> > > > >
> >> > > > > Best
> >> > > > > Patrick
> >> > > > >
> >> > > >
> >> > > >
> ---------------------------------------------------------------------
> >> > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > > > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> > > >
> >> > > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to