Hi Robert, got it, thanks! Hi Uwe, yes we do have a way to detect whether the segment is created by node A or B even if they share the same name, however, lucene does not allow such situation (same name but generated by different writer) when calling `openIfChanged` to try to incrementally load the new index. So what I want is to attach a prefix (or postfix, anything lol) to the segment name, say "A_4" and "B_4" so that when DirectoryReader is doing `openIfChanged` it will proceed without throwing any exception.
On Sat, Dec 17, 2022 at 4:56 AM Robert Muir <rcm...@gmail.com> wrote: > No, you can't control them. And we must not open up anything to try to > support this. > > On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai <zhai7...@gmail.com> wrote: > > > > Hi Mike, Robert > > > > Thanks for replying, the system is almost like what Mike has described: > one writer is primary, > > and the other is trying to catch up and wait, but in our internal > discussion we found there might > > be small chances where the secondary mistakenly think itself as primary > (due to errors of other component) > > while primary is still alive and thus goes into the situation I > described. > > And because we want to tolerate the error in case we can't prevent it > from happening, we're looking for customizing > > filenames. > > > > Thanks again for discussing this with me and I've learnt that playing > with filenames can become quite > > troublesome, but still, even out of my own curiosity, I want to > understand whether we're able to control > > the segment names in some way? > > > > Best > > Patrick > > > > > > On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov <msoko...@gmail.com> > wrote: > >> > >> +1 trying to coordinate multiple writers running independently will > >> not work. My 2c for availability: you can have a single primary active > >> writer with a backup one waiting, receiving all the segments from the > >> primary. Then if the primary goes down, the secondary one has the most > >> recent commit replicated from the primary (identical commit, same > >> segments etc) and can pick up from there. You would need a mechanism > >> to replay the writes the primary never had a chance to commit. > >> > >> On Fri, Dec 16, 2022 at 5:41 AM Robert Muir <rcm...@gmail.com> wrote: > >> > > >> > You are still talking "Multiple writers". Like i said, going down this > >> > path (playing tricks with filenames) isn't going to work out well. > >> > > >> > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai <zhai7...@gmail.com> > wrote: > >> > > > >> > > Hi Robert, > >> > > > >> > > Maybe I didn't explain it clearly but we're not going to constantly > switch > >> > > between writers or share effort between writers, it's purely for > >> > > availability: the second writer only kicks in when the first writer > is not > >> > > available for some reason. > >> > > And as far as I know the replicator/nrt module has not provided a > solution > >> > > on when the primary node (main indexer) is down, how would we > recover with > >> > > a back up indexer? > >> > > > >> > > Thanks > >> > > Patrick > >> > > > >> > > > >> > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir <rcm...@gmail.com> > wrote: > >> > > > >> > > > This multiple-writer isn't going to work and customizing names > won't > >> > > > allow it anyway. Each file also contains a unique identifier tied > to > >> > > > its commit so that we know everything is intact. > >> > > > > >> > > > I would look at the segment replication in lucene/replicator and > not > >> > > > try to play games with files and mixing multiple writers. > >> > > > > >> > > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai <zhai7...@gmail.com> > wrote: > >> > > > > > >> > > > > Hi Folks, > >> > > > > > >> > > > > We're trying to build a search architecture using segment > replication > >> > > > (indexer and searcher are separated and indexer shipping new > segments to > >> > > > searchers) right now and one of the problems we're facing is: for > >> > > > availability reason we need to have multiple indexers running, > and when the > >> > > > searcher is switching from consuming one indexer to another, > there are > >> > > > chances where the segment names collide with each other (because > segment > >> > > > names are count based) and the searcher have to reload the whole > index. > >> > > > > To avoid that we're looking for a way to name the segments so > that > >> > > > Lucene is able to tell the difference and load only the > difference (by > >> > > > calling `openIfChanged`). I've checked the IndexWriter and the > >> > > > DocumentsWriter and it seems it is controlled by a private final > method > >> > > > `newSegmentName()` so likely not possible there. So I wonder > whether > >> > > > there's any other ways people are aware of that can help control > the > >> > > > segment names? > >> > > > > > >> > > > > A example of the situation described above: > >> > > > > Searcher previously consuming from indexer 1, and have following > >> > > > segments: _1, _2, _3, _4 > >> > > > > Indexer 2 previously sync'd from indexer 1, sharing the first 3 > >> > > > segments, and produced its own 4th segments (notioned as _4', but > it shares > >> > > > the same "_4" name): _1, _2, _3, _4' > >> > > > > Suddenly Indexer 1 dies and searcher switched from Indexer 1 to > Indexer > >> > > > 2, then when it finished downloading the segments and trying to > refresh the > >> > > > reader, it will likely hit the exception here, and seems all we > can do > >> > > > right now is to reload the whole index and that could be > potentially a high > >> > > > cost. > >> > > > > > >> > > > > Sorry for the long email and thank you in advance for any > replies! > >> > > > > > >> > > > > Best > >> > > > > Patrick > >> > > > > > >> > > > > >> > > > > --------------------------------------------------------------------- > >> > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > > > For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > >> > > > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >