Re: Is there a way to customize segment names?

Robert Muir Sat, 17 Dec 2022 04:57:02 -0800

No, you can't control them. And we must not open up anything to try to
support this.


On Fri, Dec 16, 2022 at 7:28 PM Patrick Zhai <[email protected]> wrote:
>
> Hi Mike, Robert
>
> Thanks for replying, the system is almost like what Mike has described: one 
> writer is primary,
> and the other is trying to catch up and wait, but in our internal discussion 
> we found there might
> be small chances where the secondary mistakenly think itself as primary (due 
> to errors of other component)
> while primary is still alive and thus goes into the situation I described.
> And because we want to tolerate the error in case we can't prevent it from 
> happening, we're looking for customizing
> filenames.
>
> Thanks again for discussing this with me and I've learnt that playing with 
> filenames can become quite
> troublesome, but still, even out of my own curiosity, I want to understand 
> whether we're able to control
> the segment names in some way?
>
> Best
> Patrick
>
>
> On Fri, Dec 16, 2022 at 6:36 AM Michael Sokolov <[email protected]> wrote:
>>
>> +1 trying to coordinate multiple writers running independently will
>> not work. My 2c for availability: you can have a single primary active
>> writer with a backup one waiting, receiving all the segments from the
>> primary. Then if the primary goes down, the secondary one has the most
>> recent commit replicated from the primary (identical commit, same
>> segments etc) and can pick up from there. You would need a mechanism
>> to replay the writes the primary never had a chance to commit.
>>
>> On Fri, Dec 16, 2022 at 5:41 AM Robert Muir <[email protected]> wrote:
>> >
>> > You are still talking "Multiple writers". Like i said, going down this
>> > path (playing tricks with filenames) isn't going to work out well.
>> >
>> > On Fri, Dec 16, 2022 at 2:48 AM Patrick Zhai <[email protected]> wrote:
>> > >
>> > > Hi Robert,
>> > >
>> > > Maybe I didn't explain it clearly but we're not going to constantly 
>> > > switch
>> > > between writers or share effort between writers, it's purely for
>> > > availability: the second writer only kicks in when the first writer is 
>> > > not
>> > > available for some reason.
>> > > And as far as I know the replicator/nrt module has not provided a 
>> > > solution
>> > > on when the primary node (main indexer) is down, how would we recover 
>> > > with
>> > > a back up indexer?
>> > >
>> > > Thanks
>> > > Patrick
>> > >
>> > >
>> > > On Thu, Dec 15, 2022 at 7:16 PM Robert Muir <[email protected]> wrote:
>> > >
>> > > > This multiple-writer isn't going to work and customizing names won't
>> > > > allow it anyway. Each file also contains a unique identifier tied to
>> > > > its commit so that we know everything is intact.
>> > > >
>> > > > I would look at the segment replication in lucene/replicator and not
>> > > > try to play games with files and mixing multiple writers.
>> > > >
>> > > > On Thu, Dec 15, 2022 at 5:45 PM Patrick Zhai <[email protected]> 
>> > > > wrote:
>> > > > >
>> > > > > Hi Folks,
>> > > > >
>> > > > > We're trying to build a search architecture using segment replication
>> > > > (indexer and searcher are separated and indexer shipping new segments 
>> > > > to
>> > > > searchers) right now and one of the problems we're facing is: for
>> > > > availability reason we need to have multiple indexers running, and 
>> > > > when the
>> > > > searcher is switching from consuming one indexer to another, there are
>> > > > chances where the segment names collide with each other (because 
>> > > > segment
>> > > > names are count based) and the searcher have to reload the whole index.
>> > > > > To avoid that we're looking for a way to name the segments so that
>> > > > Lucene is able to tell the difference and load only the difference (by
>> > > > calling `openIfChanged`). I've checked the IndexWriter and the
>> > > > DocumentsWriter and it seems it is controlled by a private final method
>> > > > `newSegmentName()` so likely not possible there. So I wonder whether
>> > > > there's any other ways people are aware of that can help control the
>> > > > segment names?
>> > > > >
>> > > > > A example of the situation described above:
>> > > > > Searcher previously consuming from indexer 1, and have following
>> > > > segments: _1, _2, _3, _4
>> > > > > Indexer 2 previously sync'd from indexer 1, sharing the first 3
>> > > > segments, and produced its own 4th segments (notioned as _4', but it 
>> > > > shares
>> > > > the same "_4" name): _1, _2, _3, _4'
>> > > > > Suddenly Indexer 1 dies and searcher switched from Indexer 1 to 
>> > > > > Indexer
>> > > > 2, then when it finished downloading the segments and trying to 
>> > > > refresh the
>> > > > reader, it will likely hit the exception here, and seems all we can do
>> > > > right now is to reload the whole index and that could be potentially a 
>> > > > high
>> > > > cost.
>> > > > >
>> > > > > Sorry for the long email and thank you in advance for any replies!
>> > > > >
>> > > > > Best
>> > > > > Patrick
>> > > > >
>> > > >
>> > > > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: [email protected]
>> > > > For additional commands, e-mail: [email protected]
>> > > >
>> > > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Is there a way to customize segment names?

Reply via email to