No, you can't control them. And we must not open up anything to try to support this.
On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai <zhai7...@gmail.com> wrote: > > Hi Mike, Robert > > Thanks for replying, the system is almost like what Mike has described: one > writer is primary, > and the other is trying to catch up and wait, but in our internal discussion > we found there might > be small chances where the secondary mistakenly think itself as primary (due > to errors of other component) > while primary is still alive and thus goes into the situation I described. > And because we want to tolerate the error in case we can't prevent it from > happening, we're looking for customizing > filenames. > > Thanks again for discussing this with me and I've learnt that playing with > filenames can become quite > troublesome, but still, even out of my own curiosity, I want to understand > whether we're able to control > the segment names in some way? > > Best > Patrick > > > On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov <msoko...@gmail.com> wrote: >> >> +1 trying to coordinate multiple writers running independently will >> not work. My 2c for availability: you can have a single primary active >> writer with a backup one waiting, receiving all the segments from the >> primary. Then if the primary goes down, the secondary one has the most >> recent commit replicated from the primary (identical commit, same >> segments etc) and can pick up from there. You would need a mechanism >> to replay the writes the primary never had a chance to commit. >> >> On Fri, Dec 16, 2022 at 5:41 AM Robert Muir <rcm...@gmail.com> wrote: >> > >> > You are still talking "Multiple writers". Like i said, going down this >> > path (playing tricks with filenames) isn't going to work out well. >> > >> > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai <zhai7...@gmail.com> wrote: >> > > >> > > Hi Robert, >> > > >> > > Maybe I didn't explain it clearly but we're not going to constantly >> > > switch >> > > between writers or share effort between writers, it's purely for >> > > availability: the second writer only kicks in when the first writer is >> > > not >> > > available for some reason. >> > > And as far as I know the replicator/nrt module has not provided a >> > > solution >> > > on when the primary node (main indexer) is down, how would we recover >> > > with >> > > a back up indexer? >> > > >> > > Thanks >> > > Patrick >> > > >> > > >> > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir <rcm...@gmail.com> wrote: >> > > >> > > > This multiple-writer isn't going to work and customizing names won't >> > > > allow it anyway. Each file also contains a unique identifier tied to >> > > > its commit so that we know everything is intact. >> > > > >> > > > I would look at the segment replication in lucene/replicator and not >> > > > try to play games with files and mixing multiple writers. >> > > > >> > > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai <zhai7...@gmail.com> >> > > > wrote: >> > > > > >> > > > > Hi Folks, >> > > > > >> > > > > We're trying to build a search architecture using segment replication >> > > > (indexer and searcher are separated and indexer shipping new segments >> > > > to >> > > > searchers) right now and one of the problems we're facing is: for >> > > > availability reason we need to have multiple indexers running, and >> > > > when the >> > > > searcher is switching from consuming one indexer to another, there are >> > > > chances where the segment names collide with each other (because >> > > > segment >> > > > names are count based) and the searcher have to reload the whole index. >> > > > > To avoid that we're looking for a way to name the segments so that >> > > > Lucene is able to tell the difference and load only the difference (by >> > > > calling `openIfChanged`). I've checked the IndexWriter and the >> > > > DocumentsWriter and it seems it is controlled by a private final method >> > > > `newSegmentName()` so likely not possible there. So I wonder whether >> > > > there's any other ways people are aware of that can help control the >> > > > segment names? >> > > > > >> > > > > A example of the situation described above: >> > > > > Searcher previously consuming from indexer 1, and have following >> > > > segments: _1, _2, _3, _4 >> > > > > Indexer 2 previously sync'd from indexer 1, sharing the first 3 >> > > > segments, and produced its own 4th segments (notioned as _4', but it >> > > > shares >> > > > the same "_4" name): _1, _2, _3, _4' >> > > > > Suddenly Indexer 1 dies and searcher switched from Indexer 1 to >> > > > > Indexer >> > > > 2, then when it finished downloading the segments and trying to >> > > > refresh the >> > > > reader, it will likely hit the exception here, and seems all we can do >> > > > right now is to reload the whole index and that could be potentially a >> > > > high >> > > > cost. >> > > > > >> > > > > Sorry for the long email and thank you in advance for any replies! >> > > > > >> > > > > Best >> > > > > Patrick >> > > > > >> > > > >> > > > --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > > > For additional commands, e-mail: dev-h...@lucene.apache.org >> > > > >> > > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org