I took a look at PostgreSQL and its capability of restoring its WAL files and 
then switching to stream mode for replication once those are complete is almost 
what I desire.

>From its manual:

http://www.postgresql.org/docs/9.3/static/warm-standby.html

" At startup, the standby begins by restoring all WAL available in the archive 
location, calling restore_command. Once it reaches the end of WAL available 
there and restore_command fails, it tries to restore any WAL available in the 
pg_xlog directory. If that fails, and streaming replication has been 
configured, the standby tries to connect to the primary server and start 
streaming WAL from the last valid record found in archive or pg_xlog. If that 
fails or streaming replication is not configured, or if the connection is later 
disconnected, the standby goes back to step 1 and tries to restore the file 
from the archive again. This loop of retries from the archive, pg_xlog, and via 
streaming replication goes on until the server is stopped or failover is 
triggered by a trigger file."

Just thinking outside of the box and not knowing Derby's internals yet, I could 
see something like:

- a slave connects to the master and issues the equivalent of "start 
replication"
- the master uses something similar to the online backup but instead of writing 
the backup to a file, it writes the backup to a stream which is transported to 
the slave.  Simultaneously it also starts the replication stream writing the 
log entries to the slave.  This is done at the same time because as far as I 
can tell, when an online database backup is started, the backed up data is 
consistent as it was when the backup beings (ie. continuing changes to the 
database while the online backup is occurring are not present in the backup) so 
if both the backup and starting the shipping of the replication logs are 
performed at the same time, then once the backup is complete, it plus any 
replication log entries represent at consistent state of the database
- the slave would process the backup stream, creating the database until this 
is complete.  Simultaneously it would be receiving the replication logs and 
persisting those.  Once the backup is completely received, it would process the 
persisted replication logs and then continue to process any new replication 
logs as they arrive.

I understand that this might take a while to complete and would require storage 
at the slave to persist the replication logs while processing the backup from 
the master.  I also understand that the equivalent of the online backup might 
slow down the master while this is occurring but I think having the ability to 
bring up a slave without having down time on the master would be a great 
feature.

I think internally, Derby has most of what is needed to accomplish this already:

   - It already has the ability to perform an online backup.  What would need 
to be added would be the ability to write the backup data over network 
connection instead of to a filesystem storage
   - It already has the ability to perform asynchronous replication using the 
recovery log.  What would need to be added would be on the slave side to buffer 
and not process this until a consistent backup were received.

Any thoughts on this?

-----Original Message-----
From: Bergquist, Brett [mailto:[email protected]] 
Sent: Tuesday, January 14, 2014 12:49 PM
To: [email protected]
Subject: RE: Question on recoverying after replication break because of a 
system failure

Actually the expensive part is having the "master" system down to ensure a 
completely accurate copy of the database is being made on the "slave".  Note 
that my "master" here could be actually be the (original) slave system when the 
(original) master system is repaired.   

Derby's replication once the systems are in sync and running seems to be okay.  
 It is the initial setup time to get the "slave" database to be the same as the 
"master" database that is expensive because currently (unless I am wrong and 
correct me here if so), the master cannot be modified while this is occurring.  
Then again, restoring to the replication state once a failed system is repaired 
is again expensive.

I guess I will look at how other database handle this case.  I can't imagine 
that adding a "replication slave" requires that the master database being down 
and quiescent.  I would image that it is possible to add a "replication slave" 
while the "replication master" is hot and running.   This is what I would like 
Derby to be able to do (note that I am not asking for someone else to do it, as 
it could very well be a contribution from me).

An analogy would be replacing a failed disk in a RAID array.  The RAID array 
continues to operate with the failed disk installed.  Now the disk is removed 
and a new one is installed.   Access to the RAID array is not blocked while the 
RAID rebuilds the data on the missing disk.  

It would be real useful for Derby to operate similarly whereby the replication 
database can be rebuilt in the background.  Note that while this is being done 
the replication is degraded (not operating of course with the current 
one-to-one replication) just is a RAID array is while the disk is being 
resilvered, but once this process is done, then the replication is up and 
running.

-----Original Message-----
From: Rick Hillegas [mailto:[email protected]]
Sent: Tuesday, January 14, 2014 11:40 AM
To: [email protected]
Subject: Re: Question on recoverying after replication break because of a 
system failure

Hi Brett,

I'm afraid that I'm not following your proposal. Some comments inline...

On 1/10/14 1:45 PM, Bergquist, Brett wrote:
>
> The reason I am posting to the dev list is that I might want to look 
> into improving Derby in this area.
>
> Just so that I am understand correctly, the steps for replication are:
>
> *Make a copy of the database to the slave
>
This seems to be the expensive step which results in long downtime.
>
> *Start replication on the slave and on the master
>
> Now assume that this is working right along and all is well and then 
> the system with the master fails.   So replication is broke and then 
> the slave can be restarted in non-replication mode.   Time goes along 
> and changes are made to the non-replicated database on the slave.   
> Finally the master machine is brought back on line.
>
> So to get replication going we need to:
>
> *Copy the database from the slave to the master
>
> *Start replication on the slave and on the master
>
> This assumes that we have an affinity for having the master being the 
> master but even if this is not the case and the old slave is going to 
> become the new master, we need to copy the database from the slave to 
> the master before starting replication again.
>
> Given a database that is fairly large (say on the order of 200Gb) and 
> not a Gig connection between the master and slave, this could be a
> fairly long time for the transfer to occur.   Unfortunately during 
> this transfer time, neither database can be used.    So while 
> replication allows quick fail over in an initial failure, 
> re-establishing the replication when the failure has been resolved can 
> cause a substantial long downtime.
>
> So my question, is there any way that this downtime can be reduced?   
> Could something be done with restoring a backup database and use the 
> logs and then enable replication.     Something like:
>
> *Make a file system level backup of the slave (using something like 
> freeze and ZFS snapshot, this can take only a couple of seconds) and 
> then allow the slave to continue
>
> oAssuming that the database logs are being used so that they can be 
> replayed later
>
> *Transfer the database to the master
>
I don't understand how this step is different from the expensive step you want 
to eliminate.

Thanks,
-Rick
>
> *Transfer the logs
>
> oReplay each log on the master somehow to get the master to catch up 
> to the slave as close as possible
>
> *Stop the slave so that it becomes consistent
>
> *Transfer the last log to the master and replay the master log
>
> *Enable replication on the master and the slave
>
> Basically limiting the downtime while the database transfer and log 
> file transfer is taking place and then to have a small window of down 
> time where they databases need to become in sync and then replication 
> can be started again.
>
> Any thoughts on this?   Is this an approach that is worth looking at?
>

Reply via email to