[ 
https://issues.apache.org/jira/browse/SOLR-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024675#comment-15024675
 ] 

Renaud Delbru commented on SOLR-8263:
-------------------------------------

[~shalinmangar] Yes, you understood the sequence correctly. To be more precise 
here is how it works:
1) the tlog files of the leader are downloaded in a temporary directory
2) After the files have been downloaded properly, a write lock is acquired by 
the IndexFetcher. The original tlog directory is renamed as a backup directory, 
and the temporary directory is renamed as the active tlog directory.
3) The update log is reset with the new active log directory. During this 
reset, the recovery info is used to read the backup buffered tlog file and 
every buffered operation is copied to the new buffered tlog.
4) The write lock is released, and the recovery operation will continue and 
apply the buffered updates.

Indeed, the buffered tlog can contain duplicate operations with the replica 
update log. During the recovery operation, the replica might receive from the 
leader some operations that will be buffered, but they might be also present in 
one of the tlog that is downloaded from the leader. Apart from the disk space 
usage of these duplicate operations and the additional network transfer, there 
is no harm, as these duplicate operations will be ignored by the peer cluster. 
We could improve the tlog recovery operation to de-duplicate the buffered tlog 
while copying the buffered updates. We could check the version of the latest 
operations in the downloaded tlog, and skip operations from the buffered tlog 
if their version is inferior to the latest know. It should be a relatively 
small patch. I can try to work on that in the next days and submit something, 
if that's fine with you and [~erickerickson] ?



> Tlog replication could interfere with the replay of buffered updates
> --------------------------------------------------------------------
>
>                 Key: SOLR-8263
>                 URL: https://issues.apache.org/jira/browse/SOLR-8263
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Renaud Delbru
>            Assignee: Erick Erickson
>         Attachments: SOLR-8263-trunk-1.patch, SOLR-8263-trunk-2.patch
>
>
> The current implementation of the tlog replication might interfere with the 
> replay of the buffered updates. The current tlog replication works as follow:
> 1) Fetch the the tlog files from the master
> 2) reset the update log before switching the tlog directory
> 3) switch the tlog directory and re-initialise the update log with the new 
> directory.
> Currently there is no logic to keep "buffered updates" while resetting and 
> reinitializing the update log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to