Erick Erickson commented on SOLR-11069:

I'm dithering back and forth about this. I suspect that we're conflating a 
couple of issues. There's definitely a problem with bootstrapping (I'll attach 
a patch in a minute). It may well be that the LASTPROCESSEDVERSION is not 
actually a problem, at least in some testing (with the attached patch) the fact 
that it is -1 when buffering is enabled seems to be OK.

I propose we use the patch as a starting point to see if this 
LASTPROCESSEDVERSION is a problem or not.

1> when buffering is enabled, tlogs will accrue forever according to the 
original intent. From Renaud:

The original goal of the buffer on cdcr is to indeed keep indefinitely the 
tlogs until the buffer is deactivated 
 This was useful for example during maintenance operations, to ensure that the 
source cluster will keep all the tlogs until the target clsuter is properly 
initialised. In this scenario, one will activate the buffer on the source. The 
source will start to store all the tlogs (and does not purge them). Once the 
target cluster is initialised, and has register a tlog pointer on the source, 
one can deactivate the buffer on the source and the tlog will start to be 
purged once they are read by the target cluster.

But additionally he had this to say:
Regarding the issue about LPV = -1, I am a bit surprised as this sentinel value 
should be used only when the source cluster does not have any log pointers, 
i.e., no target cluster were configured and initialised with this source 
cluster. In this case it indicates that there is no registered log reader, and 
that we should not remove any tlogs if buffer is enabled (as we have to wait 
for the target to register a log reader and log pointer). 

And enabling buffering definitely causes LASTPROCESSEDVERSION to return -1. 
However, with the patch LPV immediately goes back to a reasonable value as soon 
as buffering is disabled, the tlogs get cleaned up etc. without bootstrapping. 
So I do wonder if the -1 value is just overloaded in this case to also mean 
"don't purge tlogs".

We need to unentangle a couple of things. I'll attach a patch in a few minutes 
that might help.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>                 Key: SOLR-11069
>                 URL: https://issues.apache.org/jira/browse/SOLR-11069
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>    Affects Versions: 7.0
>            Reporter: Amrit Sarkar
>            Assignee: Erick Erickson
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to