[
https://issues.apache.org/jira/browse/SOLR-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249622#comment-15249622
]
Renaud Delbru commented on SOLR-6465:
-------------------------------------
It would be great indeed to be able to simplify the code as you proposed if we
can rely on a bootstrap method. Below are some observations that might be
useful.
One of the concern I have is related to the default size limit of the update
logs. By default, it keeps 10 tlog files or 100 records. This will likely be
too small for providing enough buffer for cdcr, and there might be a risk of a
continuous cycle of bootstrapping replication. One could increase the values of
"numRecordsToKeep" and "maxNumLogsToKeep" in solrconfig to accommodate the cdcr
requirements. But this is an additional parameter that the user needs to take
into consideration, and make configuration more complex. I am wondering if we
could find a more appropriate default value for cdcr ?
The issue with increasing limits in the original update log compared to the
cdcr update log is that the original update log will not clean old tlogs files
(it will keep all tlogs up to that limit) that are not necessary anymore for
the replication. For example, if one increase the maxNumLogsToKeep to 100 and
numRecordsToKeep 1000, then the node will always have 100 tlogs files or 1000
records in the update logs, even if all of them has been replicated to the
target clusters. This might cause unexpected issues related to disk space or
performance.
The CdcrUpdateLog was managing this by allowing a variable size update log that
removes a tlog when it has been fully replicated. But then this means we go
back to where we were with all the added management around the cdcr update log,
i.e., buffer, lastprocessedversion, CdcrLogSynchronizer, ...
h4. Cdcr Buffer
If we get rid of the cdcr update log logic, then we can also get rid of the
Cdcr Buffer (buffer state, buffer commands, etc.)
h4. CdcrUpdateLog
I am not sure if we can get entirely rid of the CdcrUpdateLog. It includes
logic such as sub-reader and forward seek that are necessary for sending batch
updates. Maybe this logic can be moved in the UpdateLog ?
h4. CdcrLogSynchronizer
I think it is safe to get rid of this. In the case where a leader goes down
while a cdcr reader is forwarding updates, the new leader will likely miss the
tlogs necessary to resume where the cdcr reader stopped. But in this case, it
can fall back to bootstrapping.
h4. Tlog Replication
If the tlogs are not replicated during a bootstrap, then tlogs on target will
not be in synch. Could this cause any issues on the target cluster, e.g., in
case of a recovery ?
If the target is itself configured as a source (i.e. daisy chain), this will
probably cause issues. The update logs will likely contain gaps, and it will be
very difficult for the source to know that there is a gap. Therefore, it might
forward incomplete updates. But this might be a feature we could drop, as
suggested in one of your comment on the cwiki.
> CDCR: fall back to whole-index replication when tlogs are insufficient
> ----------------------------------------------------------------------
>
> Key: SOLR-6465
> URL: https://issues.apache.org/jira/browse/SOLR-6465
> Project: Solr
> Issue Type: Sub-task
> Reporter: Yonik Seeley
> Attachments: SOLR-6465.patch, SOLR-6465.patch
>
>
> When the peer-shard doesn't have transaction logs to forward all the needed
> updates to bring a peer up to date, we need to fall back to normal
> replication.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]