[
https://issues.apache.org/jira/browse/DERBY-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582949#action_12582949
]
Jørgen Løland commented on DERBY-3562:
--------------------------------------
Mike Matrigali writes:
> I didn't quite follow all of this, and admit i am not up on replication.
> It would be nice if this process was the exact code as the normal
> checkpoint processing. So a checkpoint would be triggered and then
> after it had done it's work it would do the appropriate cleanup. If you
> do the cleanup too soon then redo recovery of the slave won't work - is
> that expected to work or at that point to you just restart from scratch
> from master.
> The existing code that replay's multiple checkpoints may be wierd as it
> may assume that this is recovery of a backed up database that is meant
> to keep all of it's log files. Make sure to not break that.
> Is there a concept of a "fully" recoverable slave, ie. one that is
> supposed to keep all of it's log files so that it is recoverable in
> case of a data crash. As I said may not be necessary as there is
> always the master. Just good to know what is expected.
Mike,
Thank you for expressing your concerns. I'll do my best to explain why I think
the proposed solution will work.
The patch adds functionality to the checkpoint processing used during recovery
(LogToFile#checkpointInRFR). During recovery, the dirty data pages are flushed
to disk, and the log.ctrl file is updated to point to the new checkpoint
currently being processed.
With the patch [1], the log files that are older than the currently processed
checkpoint's Undo Low Water Mark (undo LWM) are then deleted. The undo LWM
points to the earliest log record that may be required to do recovery [2].
Since the log files are processed sequentially and the data pages have been
flushed, the undo LWM in the checkpoint is equally valid during recovery (aka
slave replication mode) as during normal transaction processing.
Once replication has successfully started, the slave database will always be
recoverable [3], but not in case of corrupted data blocks [4]. You may at any
time crash the Derby serving the slave database and then reboot it. The
used-to-be-slave database will then recover to a transaction consistent state
including the modifications from all transactions whose commit log record was
written to disk on the slave before the crash.
Please follow up if you think I may have misunderstood anything or did not
answer your questions good enough.
[1] The patch only applies to slave replication mode. Backup is not affected as
to not break the "fully" recoverability feature for backups.
[2] The first log record of the oldest transaction in the checkpoint's
transaction table.
[3] If "fully" recoverable means recovering in presence of corrupted data
blocks, this is currently not supported for replication.
[4] Not including jar files, as explained in DERBY-3552.
> Number of log files (and log dir size) on the slave increases continuously
> --------------------------------------------------------------------------
>
> Key: DERBY-3562
> URL: https://issues.apache.org/jira/browse/DERBY-3562
> Project: Derby
> Issue Type: Bug
> Components: Replication
> Affects Versions: 10.4.0.0, 10.5.0.0
> Environment: -
> Reporter: Ole Solberg
> Assignee: Jørgen Løland
> Attachments: derby-3562-1a.diff, derby-3562-1a.stat,
> master_slave-db_size-6.jpg
>
>
> I did a simple test inserting tuples in a table during replication:
> The attached file 'master_slave-db_size-6.jpg' shows that
> the size of the log directory (and number of files in the log directory)
> increases continuously during replication, while on master the size
> (and number of files) never exceeds ~12Mb (12 files?) in this scenario.
> The seg0 directory on the slave stays at the same size as the master
> seg0 directory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.