Erick Erickson updated SOLR-11069:
    Attachment: SOLR-11069.patch

figuring out the LPV issue is hard because bootstrapping had a problem. At the 
end of the process, the core is reloaded. However, that means that the code 
that checks on the state of the replication returns a "notfound", which causes 
another bootstrap command to be sent.

So this patch moves the relevant objects to (Default)SolrCoreState where 
they're preserved around core reloads. With this patch (PoC) I can get 
bootstrapping to occur, enable/disable buffering, bring the target up and down 
etc. The fact that LPV is -1 when buffering is enabled doesn't seem to be a 

So if others can give this a whirl and see if their testing is OK with it then 
maybe the LPV issue is not an issue.

Mostly I'm throwing this out for others to consider. What do people think about 
putting the additional objects in SolrCoreState? Putting the objects there was 
quick, I'm interested in seeing if my results work for others. If so we can 
decide whether this is the right way to go.

Haven't run precommit, haven't run the full test suite. Did run 
CdcrBootstrapTest. Also, the CDCR docs need to be updated.

> LASTPROCESSEDVERSION for CDCR is flawed when buffering is enabled
> -----------------------------------------------------------------
>                 Key: SOLR-11069
>                 URL: https://issues.apache.org/jira/browse/SOLR-11069
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>    Affects Versions: 7.0
>            Reporter: Amrit Sarkar
>            Assignee: Erick Erickson
>         Attachments: SOLR-11069.patch
> {{LASTPROCESSEDVERSION}} (a.b.v. LPV) action for CDCR breaks down due to 
> poorly initialised and maintained buffer log for either source or target 
> cluster core nodes.
> If buffer is enabled for cores of either source or target cluster, it return 
> {{-1}}, *irrespective of number of entries in tlog read by the {{leader}}* 
> node of each shard of respective collection of respective cluster. Once 
> disabled, it starts telling us the correct LPV for each core.
> Due to the same flawed behavior, Update Log Synchroniser may doesn't work 
> properly as expected, i.e. provides incorrect seek to the {{non-leader}} 
> nodes to advance at. I am not sure whether this is an intended behavior for 
> sync but it surely doesn't feel right.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to