>>>>> "ST" == Suresh Thalamati <[EMAIL PROTECTED]> writes:
ST> Thanks for the input , Øystein. My comments are in-line. Øystein
ST> Grøvlen wrote:
>> I guess the idea is to do a log switch when the data files have been
>> copied and copy all log from before this log switch.
ST> That's what I plan to do, except the the log switch will not
ST> happen until it is time to determine the last log file that
ST> needs to go into the backup after copying the data files and
ST> the rest of the log files. From the user perspective ,
Good idea. Then the uncertainty of what really has made it into the
backup is limited to just a few seconds before it completed.
ST> Derby online backup will include all the transactions that
ST> were committed the backup. i.e if the backup starts at 9PM
ST> and ends at 10PM, backup will have data approximately until
ST> 10PM almost , not 9PM.
ST> My understanding of what goes into the backup is that , data
ST> in the backup need not match to the exact real-time. please
ST> correct if this is wrong assumption.
I agree with you.
>> One will have to
>> prevent any log files that is needed from being garbage collected
>> before they are copied. I guess that can be achieved by running
>> backup in a transaction and log the start of backup. That way, the
>> open backup transaction should prevent garbage-collection of relevant
>> log files.
>>
>>
ST> I agree it by writing log record for start of backup, we can prevent
ST> garbage-collection of log files.
ST> My initial thought is to simply disable garbage-collection of log
ST> files for the duration of the backup. unless there are some specific
ST> advantages in writing backup-start log record.
Disabling garabage-collection directly is probably the cleanest way to
do this.
How will you determine where to start the redo scan at recovery? Do
you need some mark in the log for that purpose?
>> One question I have is whether compensation log records (CLR) in Derby
>> are self-contained. If they depend on the original non-CLR, log
>> records for transactions that were rolled back and terminated long
>> before the backup is finished, will be needed to be able to redo the
>> CLR.
>>
ST> I think CLR's are not self contained in Derby. CLR log
ST> records contain, log instant of the original operation log
ST> record that was rolled back.
If garbage-collection is disabled this does probably not matter, but
this means that a start-of-backup log record is not sufficient to
prevent garbage-collection of relevant log records.
>> Generally, we cannot give a guarantee that operations that are
>> performed during backup are reflected in the backup. If I have
>> understand correctly, transactions that commits after the data copying
>> is finished, will not be reflected. Since a user will not be able to
>> distiguish between operations committed during data copying and
>> operations committed during log copying, he cannot be sure concurrent
>> operations is reflected in the backup.
>>
>>
ST> I agree with you that , one can not absolutely guarantee that
ST> backup will include operations committed till a particular
ST> time are included in the backup. But the backup design
ST> depends on the transactions log to bring the database to
ST> consistent state , because when data files are being copied ,
ST> it is possible that some of the page are written to the disk.
ST> So we need the transaction log until the data files are copied
ST> for sure. If a user commits a non-logged operation when data
ST> files are being copied , he/she would expect it to be there in
ST> the backup, similar to a logged operation.
My point was that a user will not be able to distiguish between the
data file copying period and the log copying period. Hence, he does
not know whether his operation was committed while the data files was
being copied.
ST> Please note that non-logging operation in Derby are not
ST> explicit to the users, most of non-logging stuff is done by
ST> the system without the user knowledge.
I understand.
>> This is not more of an issue for a new backup mechanism than it is
>> currently for roll-forward recovery. Roll-forward recovery will not be
>> able recover non-logged operations either.
ST> Yes. Roll-forward recovery has same issues, once the log
ST> archive mode that is required for roll-forward recovery is
ST> enabled all the operations are logged including the operations
ST> that are not logged normally like create index. But I think
ST> the currently derby does not handle correctly . it does not
ST> force logging for non-logged operations that were started
ST> before log archive mode is enabled.
The cheapest way to handle non-logged operations that started before
backup/archive mode enabling, is to just make them fail and roll them
back. I think that would be an acceptable solution.
>> If users needs that, we
>> should provide logged version of these operations.
>>
>>
ST> I think, during backup non-logged operations should be logged by the
ST> system or block them.
I think blocking them should be acceptable to most users.
ST> If user is really concerned of performance they will not
ST> execute them in parallel.
This advice may work for backup, but not for enabling roll-forward
recovery. If I was user that was concerned with performance, I think
I would prefer to still create an index unlogged and rather recreate
it if recovery is needed. (I guess this would require roll-forward
recovery to ignore updates to non-existing indexes.) I could limit
the vulnerability by making a backup after unlogged operations have
been performed.
By the way, how is normal recovery of unlogged operations handled? Is
the commit of unlogged operations delayed until all data pages created
by the operation have been flushed to disk?
--
Øystein