[
http://issues.apache.org/jira/browse/DERBY-239?page=comments#action_12316434 ]
Suresh Thalamati commented on DERBY-239:
----------------------------------------
I think providing an online backup mechanism that does not block changes
to the database when the backup is in progress will be a useful feature to
Derby users,
especially in the client/server environment. This backup mechanism might take
more time
than current online backup because of the synchronization overheads required to
allow changes to
the database when backup is in progress. At this point I am not sure how much
more
time it will take, but I think it should not be more than 50%, in the worst
case scenario.
Current online backup mechanism (that blocks changes to the database) is
supported using system procedures(ex:SYSCS_UTIL.SYSCS_BACKUP_DATABASE ). My
plan is to make the existing backup procedures work work without blocking the
changes to the database; No new system procedures are required. If community
thinks
both blocking/non-blocking type backups are useful, new procedures can
be added.
Currently backup contains mainly data files (seg0/*) and the transaction log
files(log/*) that are there when the backup started. On restore from the
backup, transactions are replayed, similar to crash-recovery to bring the
database
to a consistent state. New online backup also will work same way, except
that all the transaction log must be copied to the backup, only after all the
data
files are backed up.
I think current implementation freezes(no changes to the database) the database
during backup for following reasons :
1) Data files will in a stable state; backup will not contain partially updates
pages on the disk.
2) No new data files will be added/deleted on the disk;
because create/drop operations are blocked.
3) No transaction will committed after the backup starts. So all
unlogged operations will be rolled back.
If the database is not frozen above conditions will not be true, that might
lead to the backups that are in corrupted/inconsistent state. I think, it is
not necessary to freeze the whole database to make a stable backup copy, by
blocking operations that modifies the files on-disk for small amounts of time,
a stable backup can be made.
Following sections explain some of the issues and possible ways to address them
to
provide a real online backup that does not block changes to the database for
the whole
duration of the backup.
1) Corrupt pages in the backup database:
Backup reads and the page cache writes can be interleaved if the database is
not frozen. i.e it is possible to land up with a page in the backup that has
a portion of the page that is more up-to-date than the rest of the page, if the
page cache writes are not blocked when a page is being read for the backup.
To avoid backup process reading partial written pages, some kind of
synchronization mechanism that does not allow reading a page to write to the
back backup when the same page is being written to the disk. This can be
implemented by one of the following approaches:
a) By latching on a page key (container id, page number) while doing the write
of the page from cache to disk and while reading the page from the
disk/cache to write to the backup. This approach has small overhead of
acquiring an extra latch during the page cache writes when the backup is in
progress.
or
b) read each pages in to the page cache first and then latch the
page in the cache until a temporary copy of it is made. This approach
does not have extra overhead of extra latches on the page keys during writes
, but
will pollute the page cache with the pages that are only required by the
backup; this might have impact on user operations because active user
pages may
have been replaced by the backup pages in the page cache.
or
c) read pages into buffer pool and latch them while making a copy similar to
the above approach, but some how make sure that user pages are not kicked out
of the buffer pool.
One optimization that may be made is to copy the file on the disk as it
is to the backup, but keep track of pages that gets modified when file was
being copied and rewrite those pages by using one of the above latching
mechanisms.
2) Committed Non logged operation:
Basic requirement to have consistent database backup is after the checkpoint
for the backup all changes to the database will be available in the
transaction log. But Derby provides some non logged operations for
performance reasons , for example CREATE INDEX , IMPORT to a empty table
..etc.
This was not a issue in the old backup mechanism because no operations will
be committed once the backup starts. So any non logged operations will be
rolled
back similar to the regular crash recovery.
I can think of two ways to address this issue:
a) To block non-logged operations when backup is in progress and also make
backup
wait before copying until the non-logged operation are complete.
b) make backup always wait for the non-logged operations to complete and
retake the backup of those files that got affected by the non-logged
operation, if they were already backed up.
c) Some how trigger logging for all the operations after the checkpoint for
the backup until the backup is complete. This one is easy to implement
for non-logged operation that are stated after the backup, but the
tricky case is to trigger logging for those non-logging operation that
started before the backup but are committed during the backup.
3) drop of a table when the file on the disk is being backed up. drop of
a table will result in deletion of the file on the disk, but deletion will
get errors
if it is opened for backup.
Some form of synchronization required to make sure that users do not see
weird errors in this case.
4) creating a table/index after the data files are backed up. Basically
recovery system expects that file on the disk exists before the log records
that refer to it are written to the transaction log.
I think roll-forward recovery already handles this case , but should be
tested.
5) data file growth because of inserts when the file(table/index) is being
backed up.
Recovery system expects that a page is allocated on the disk
before log records are written to the transaction log about a page to
avoid recovery errors because of space issues except incase of roll-forward
recovery.
I think roll-forward recovery handles this case already; but have to make
sure it will work in this case also. Test cases should be added.
Some form of synchronization is required, to make a stable table snap shot
of the
file , if the file is growing when the backup is in progress.
6) checkpoints when the backup is in progress.
I think it not necessary to allow checkpoints when the backup is in
progress. But if some one thinks otherwise , following should
be addressed:
1) make copy of the log control file for the backup before copying any
2) If there are any operations that rely on checkpoint to make the
operation consistent should not be allowed because backup might have
already copied some files when checkpoint happens.
Any comments/suggestions will be appreciated.
Thanks
-suresh
> Need a online backup feature that does not block update operations when
> online backup is in progress.
> --------------------------------------------------------------------------------------------------------
>
> Key: DERBY-239
> URL: http://issues.apache.org/jira/browse/DERBY-239
> Project: Derby
> Type: New Feature
> Components: Store
> Versions: 10.1.1.0
> Reporter: Suresh Thalamati
> Assignee: Suresh Thalamati
>
> Currently Derby allows users to perfoms online backups using
> SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure, but while the backup is in
> progress, update operations are temporarily blocked, but read operations can
> still proceed.
> Blocking update operations can be real issue specifically in client server
> environments, because user requests will be blocked for a long time if a
> backup is in the progress on the server.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira