Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.

Suresh Thalamati Mon, 25 Jul 2005 16:41:50 -0700

Thanks for the input , Øystein. My comments are in-line.Øystein Grøvlen wrote:

Øystein

"ST(" == Suresh Thalamati (JIRA) <[email protected]> writes:


   ST(> Currently backup contains mainly data files (seg0/*) and the
   ST(> transaction log files(log/*) that are there when the backup
   ST(> started. On restore from the backup, transactions are
   ST(> replayed, similar to crash-recovery to bring the database to
   ST(> a consistent state. New online backup also will work same
   ST(> way, except that all the transaction log must be copied to
   ST(> the backup, only after all the data files are backed up.

I guess the idea is to do a log switch when the data files have been

copied and copy all log from before this log switch.

That's what I plan to do, except the the log switch will not happenuntil it is time to determine the last logfile that needs to go into the backup after copying the data files andthe rest of the log files. From the user perspective ,Derby online backup will include all the transactions that werecommitted the backup. i.e if the backup starts at9PM and ends at 10PM, backup will have data approximately until 10PMalmost , not 9PM.

My understanding of what goes into the backup is that , data in thebackup need not match to the

exact real-time. please  correct if this is wrong  assumption.

One will have to
prevent any log files that is needed from being garbage collected
before they are copied.  I guess that can be achieved by running
backup in a transaction and log the start of backup.  That way, the
open backup transaction should prevent garbage-collection of relevant
log files.

I agree it by writing log record for start of backup, we can preventgarbage-collection of log files.My initial thought is to simply disable garbage-collection of logfiles for the duration of the backup.unless there are some specific advantages in writing backup-start logrecord.

One question I have is whether compensation log records (CLR) in Derby
are self-contained.  If they depend on the original non-CLR, log
records for transactions that were rolled back and terminated long
before the backup is finished, will be needed to be able to redo the
CLR.

I think CLR's are not self contained in Derby. CLR log records contain,log instant of the original

operation log record that was rolled back.

   ST(> To avoid backup process reading partial written pages, some
   ST(> kind of synchronization mechanism that does not allow reading
   ST(> a page to write to the back backup when the same page is
   ST(> being written to the disk.  This can be implemented by one of
   ST(> the following approaches:

   ST(> a) By latching on a page key (container id, page number)
   ST(>    while doing the write of the page from cache to disk and
   ST(>    while reading the page from the disk/cache to write to the
   ST(>    backup. This approach has small overhead of acquiring an
   ST(>    extra latch during the page cache writes when the backup
   ST(>    is in progress.

If I understand correctly, you are here talking about not reading
pages into the the page cache but copying them to the backup some
other ways.  I do not think that sounds like a good idea since you
will have to care about the pages that are already in the cache
anyway.

I agree with you. I was thinking along these lines to usejdk1.4.2 file to file transfer mechanisms

to have a  super-fast backup :-)

However, I do not understand the part about an extra latch.  Don't you
have to latch a page while it is written to disk anyhow to prevent
concurrent updates during the write?

Yes. page is in latched state when it written to the disk. But thislatching mechanism is basedon the page objects in the page cache; Unless backup also goesthrough the page cache, it can not

  latch the same  page object.

   ST(> 2) Committed Non logged operation:

ST(> Basic requirement to have consistent database backup is

   ST(>   after the checkpoint for the backup all changes to the
   ST(>   database will be available in the transaction log.  But
   ST(>   Derby provides some non logged operations for performance
   ST(>   reasons , for example CREATE INDEX , IMPORT to a empty
   ST(>   table ..etc.

ST(> This was not a issue in the old backup mechanism because no

   ST(>   operations will be committed once the backup starts. So any
   ST(>   non logged operations will be rolled back similar to the
   ST(>   regular crash recovery.

Generally, we cannot give a guarantee that operations that are
performed during backup are reflected in the backup.  If I have
understand correctly, transactions that commits after the data copying
is finished, will not be reflected.  Since a user will not be able to
distiguish between operations committed during data copying and
operations committed during log copying, he cannot be sure concurrent
operations is reflected in the backup.

I agree with you that , one can not absolutely guarantee thatbackup will include operations committed tilla particular time are included in the backup. But the backup designdepends on thetransactions log to bring the database to consistent state , becausewhen data files are beingcopied , it is possible that some of the page are written to the disk.So we need thetransaction log until the data files are copied for sure. If a usercommits a non-loggedoperation when data files are being copied , he/she would expect it tobe there in the backup,similar to a logged operation. Please note that non-logging operationin Derby are notexplicit to the users, most of non-logging stuff is done by thesystem without the user knowledge.

This is not more of an issue for a new backup mechanism than it is
currently for roll-forward recovery.  Roll-forward recovery will not be

able recover non-logged operations either.

Yes. Roll-forward recovery has same issues, once the log archive modethat is required for roll-forward recoveryis enabled all the operations are logged including the operations thatare not logged normally like create index.But I think the currently derby does not handle correctly . it doesnot force logging for

non-logged operations that were started before log archive mode is enabled.

If users needs that, we
should provide logged version of these operations.

I think, during backup non-logged operations should be logged by thesystem or block them.If user is really concerned of performance they will not execute themin parallel.

   ST(> 6) checkpoints when the backup is in progress.

   ST(>    I think it not necessary to allow checkpoints when the backup is in
   ST(>    progress. But if some one thinks otherwise , following should

ST(> be addressed:

Is checkpoints the only way to write data pages to disk?  If yes,
backup cannot block checkpoints since then the backup will no longer
be non-blocking when the entire page cache becomes dirty.

No. pages are written to the disk by the cache manager to preventcache full state.I believe checkpoint can run in parallel to backup , if I make acopy of log control file at thestart of the backup and disable log-files garbage collection. Onsecond thought, I don't

see any reason for  not allowing checkpoints during backup.

Thanks
-suresh

Re: [jira] Commented: (DERBY-239) Need a online backup feature that does not block update operations when online backup is in progress.

Reply via email to