Changeset: 4d456a4a0434 for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=4d456a4a0434
Added Files:
        monetdb5/modules/mal/replication.mx
Modified Files:
        tools/merovingian/ChangeLog.Jul2012
        tools/merovingian/client/monetdb.1
        tools/merovingian/daemon/forkmserver.c
        tools/merovingian/daemon/merovingian.c
        tools/merovingian/utils/properties.c
Branch: replicationms
Log Message:

replicationms: initial commit (backout of 877b04706e12)

Bring back master-slave replication work in replicationms branch.


diffs (truncated from 1653 to 300 lines):

diff --git a/monetdb5/modules/mal/replication.mx 
b/monetdb5/modules/mal/replication.mx
new file mode 100644
--- /dev/null
+++ b/monetdb5/modules/mal/replication.mx
@@ -0,0 +1,1490 @@
+@/
+The contents of this file are subject to the MonetDB Public License
+Version 1.1 (the "License"); you may not use this file except in
+compliance with the License. You may obtain a copy of the License at
+http://www.monetdb.org/Legal/MonetDBLicense
+
+Software distributed under the License is distributed on an "AS IS"
+basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the
+License for the specific language governing rights and limitations
+under the License.
+
+The Original Code is the MonetDB Database System.
+
+The Initial Developer of the Original Code is CWI.
+Portions created by CWI are Copyright (C) 1997-July 2008 CWI.
+Copyright August 2008-2012 MonetDB B.V.
+All Rights Reserved.
+@
+
+@f replication
+
+@c
+/*
+ * @a Martin Kersten
+ * @v 1.0
+ * @+ Database replication
+ * MonetDB supports a simple database replication scheme using a master-slave
+ * protocol. A master node keeps a log of all SQL updates for replay.
+ * Once a slave starts the master establishes
+ * a MAL-client connection to the slave and starts pumping the backlog
+ * of committed transactions.
+ * The master does not take any responsibility over the integrity of a slave.
+ * The master may, however, decide to suspend
+ * forwarding updates to prepare for e.g. administration or shutdown.
+ *
+ * It is the slave's responsibility to be resilient against duplicate
+ * transmission of the MAL-update backlog. A transaction id
+ * can be given to catch up from transactions already replayed.
+ * Transaction ideas before the minimum available in the log
+ * directory leads to freezing the slave. Then rebuilding from
+ * scratch is required.
+ *
+ * The replication scheme does not support SQL scheme modifications.
+ * Instead, the slaves should be initialized with a complete copy
+ * of the master schema and the database.
+ *
+ * Turning an existing database into a master and creation of a single
+ * slave works as follows.
+ *
+ * step 1) Turn the database into a replication master by setting its
+ * "master" property to true using monetdb(1).  This property is translated
+ * by merovingian(1) into the database variable "replication_master" and is
+ * set upon database (re)start.  Note that this setting can not be added to a
+ * running database.
+ *
+ * step 2) Create a dump of the master database using the msqldump(1) tool.
+ *
+ * step 3) To initiate a slave, simply load the master snapshot.
+ *
+ * step 4) Run monetdb(1) to turn the database into a slave by setting its 
"slave" property to the URI of the master.
+ * The precise URI can be obtained issuing the command
+ * 'mclient -lmal -dmaster -s"u := master.getURI(); io.printf(\"%s\n\", u);"' 
on the master.
+ * The slave property is translated by merovingian(1) into the database 
variable "replication_slave"
+ * and is set upon database (re)start.  Note that this setting can not be 
added to a running database.
+ *
+ * The slave starts synchronizing with the master automatically upon each 
session restart.
+ * A few SQL wrapper procedures and functions can be used to control it 
manually.
+ * For example, the slave can temporarily suspend receiving log replays using 
suspendSync()
+ * and reactive it afterwards with resumeSync().
+ * A resumeSync() is also needed if you create a relation already known by the 
master,
+ * for it could have sent updates already. Due to unavailability of the target
+ * table it closed the log stream.
+ *
+ * The function freezeSlaves() removes the log files and makes sure that all
+ * existing slaves won't be able to catch up other than by re-initializing the
+ * database using e.g. a checkpoint.
+ * @verbatim
+ * CREATE PROCEDURE suspendSync() EXTERNAL NAME slave."stop";
+ * CREATE PROCEDURE resumeSync() EXTERNAL NAME slave."sync";
+ * CREATE FUNCTION synchronizing() RETURNS boolean EXTERNAL NAME 
slave."synchronizing";
+ *
+ * CREATE PROCEDURE freezeSlaves() EXTERNAL NAME master."freeze";
+ * CREATE PROCEDURE suspendSlaves() EXTERNAL NAME master."stop";
+ * CREATE PROCEDURE resumeSlaves() EXTERNAL NAME master."start";
+ * CREATE FUNCTION master() RETURNS string EXTERNAL NAME master."getURI";
+ * CREATE FUNCTION cutOffTag() RETURNS string EXTERNAL NAME 
master."getCutOffTag";
+ * @end verbatim
+ *
+ * It is possible to make a slave database also a master for descendants.
+ * In such situation the database carries both a master and slave property.
+ * Creating such scheme allows to employ hierarchical replication, or to
+ * have additional tables available in the replication stream.  Note that
+ * at this point replication from multiple masters to e.g. combine a full
+ * set from a set of partitioned masters is not yet possible.
+ *
+ * Beware, turning off the "master" property leads to automatic removal of all
+ * left-over log files.  This renders the master database unusable for 
replication.
+ * The state of the slaves becomes frozen.
+ * To restore replication in such case, both master and
+ * slaves have to be reinitialised using the aforementioned steps.
+ *
+ * @- Behind the scene
+ * When the replication_master environment is set, an optimizer
+ * becomes active to look after updates on SQL tables and to prepare
+ * for producing the log files. The snippet below illustrates the
+ * modifications made to a query plan.
+ *
+ * @verbatim
+ * function query():void
+ *   master:= "mapi:monetdb://gio.ins.cwi.nl:50000/dbmaster";
+ *   fcnid:= master.open();
+ *   ...
+ *   sql.append("schema","table","col",b:[:oid,:int]);
+ *   master.append("schema","table","col",b,fcnid);
+ *   ...
+ *   t := mtime.current_timestamp();
+ *   master.close(fcnid,t);
+ * end query;
+ * @end verbatim
+ *
+ * At runtime this leads to buffers being filled with the statements
+ * required for the slaves to catch up.
+ * Each query block is stored in its own buffer and sent at
+ * the end of the query block. This separates the concurrent
+ * actions on the database at the master and leads to a serial
+ * execution of the replication operations within the slave.
+ *
+ * The log records are stored in a file "dbfarm/db/master/log%d-%d" with the
+ * following structure:
+ * @verbatim
+ * function slave.tag1(transactionid:int,stamp:timestamp);
+ *   barrier doit:= slave.open(transactionid);
+ *     sql.transaction();
+ *     tag1_b := bat.new(:oid,:int);
+ *     ...
+ *     bat.insert(tag1_b,3:oid,232:int); #example update
+ *     ...
+ *     sql.append("schema","table","col",tag1_b,tag);
+ *     slave.close(transactionid,stamp);
+ *     sql.commit();
+ *   exit doit;
+ * end tag1;
+ * slave.tag_1(1,"2009-09-03 15:49:45.000":timestamp);
+ * slave.drop("tag1");
+ * @end verbatim
+ *
+ * The slave.open() simply checks the replica log administration table
+ * and ignores duplicate attempts to roll the database forward.
+ *
+ * The operations are executed in the serial order as on the master,
+ * which should lead to the same optimistic transactional behavior.
+ * All queries are considered running in auto-commit mode, because
+ * the SQL frontend does not provide the hook (yet) for better transaction
+ * boundary control.
+ * The transaction identifier is part of the call to the function
+ * with the transaction update details.
+ * @- Interaction protocol
+ * The master node simply waits for a slave to request the transmission of the 
missing log files.
+ * The request includes the URI of the slave and the user credentials needed 
to establish a connection.
+ * The last parameter is the last known transaction id successfully 
re-executed.
+ * The master forks a thread to start flushing the blacklog files.
+ *
+ * Grouping the operations in temporary MAL functions
+ * makes it easy to skip its execution when we detect
+ * that it has been executed before.
+ *
+ * @- Log file management
+ * The log records are grouped into separate files.
+ * They are the units for re-submission and the scheme is set up to be 
idempotent.
+ * A slave always starts synchronizing using the maximal tag stored in the 
slave log.
+ *
+ * The log files ultimately pollute your database and have to
+ * be (re)moved. This is considered a responsibility for the DBA,
+ * for it involves making a checkpoint or securely storing the logs
+ * into an archive. It can be automated by asking all slaves
+ * for their last transaction id and purge all obsolete files.
+ *
+ * Any error recognized during the replay should freeze the slave,
+ * because the synchronization integrity might become compromised.
+ *
+ * Aside  from being limited to autocommit transactions, the current
+ * implementation scheme has a hole. The log record is written just
+ * before transaction commit, including the activation call.
+ * The call and the flush of the commit record to the SQL
+ * log should be one atomic action, which amounts to a commit
+ * sequence of two 'databases'. It can only be handled when
+ * the SQL commit becomes visible at the MAL layer.
+ * [ Or, inject the transaction approval record into the log file
+ * when the next query starts, checking for any transaction
+ * errors first.]
+ *
+ * COPY INTO commands cause the master to freeze the images of
+ * all slaves. For capturing the input file and forwarding it to
+ * the slaves seems overly complicated.
+ *
+ * The slaves invalidation scheme is rather crude. The log directory
+ * is emptied and a new log file is created. Subsequent attempts
+ * by the slaves to access transactions ID before the invalidation
+ * are flagged as errors.
+ *
+ * @- Wishlist
+ * After setting the slave property, it could initiate full synchronization
+ * by asking for a catalog dump and replaying the logs. Provided, they
+ * have been kept around since the start.
+ * Alternatively, we can use the infrastructure for Octopus to pull the data 
from the master.
+ * For both we need msqldump functionality in the SQL code base.
+ *
+ * A slave property can be set to a list of masters, which turns the
+ * the slave into a serving multiple sources. It calls for splitting
+ * the slavelog.
+ *
+ * The tables in the slave should be set read-only, otherwise we
+ * have to double check integrity and bail out replication on violation.
+ * One solution is to store the replicated database in its own
+ * schema and grant read access to all users.
+ * [show example how to set up ]
+ *
+ * A validation script (or database diff) might be helpful to
+ * asses the database content for possible integrity violations.
+ */
+@mal
+module master;
+
+command open():oid
+address MASTERopen
+comment "Create a replication record";
+
+command close(tag:oid):void
+address MASTERclose
+comment "Close the replication record";
+
+command start():void
+address MASTERstart
+comment "Restart synchronisation with the slaves";
+
+command stop():void
+address MASTERstop
+comment "Stop synchronisation of the slaves";
+
+command freeze():void
+address MASTERfreeze
+comment "Invalidate all copies maintained at slaves";
+
+pattern append(mvc:ptr, s:str, t:str, c:str, :any_1, tag:oid):ptr
+address MASTERappendValue
+comment "Dump the scalar on the MAL log";
+
+pattern append(mvc:ptr, s:str, t:str, c:str, b:bat[:oid,:any_1], tag:oid):ptr
+address MASTERappend
+comment "Dump the BAT on the MAL log";
+
+pattern delete(s:str, t:str, b:bat[:oid,:any_1], tag:oid):void
+address MASTERdelete
+comment "Dump the BAT with deletions on the MAL log";
+
+pattern copy(sname:str, tname:str, tsep:str, rsep:str, ssep:str, ns:str, 
fname:str, nr:lng, offset:lng, tag:oid):void
+address MASTERcopy
+comment "A copy command leads to invalidation of the slave's image. A dump 
restore will be required.";
+
+pattern replay(uri:str, usr:str, pw:str, tag:oid):void
+address MASTERreplay
+comment "Slave calls the master to restart sending the missing transactions
+from a certain point as a named user.";
+
+command sync(uri:str, usr:str, pw:str, tag:oid):void
+address MASTERsync
+comment "Login to slave with credentials to initiate submission of the log 
records";
+
+command getURI():str
+address MASTERgetURI
+comment "Return the URI for the master";
+
+command getCutOffTag():oid
+address MASTERgetCutOffTag
+comment "Return the cutoff tag for transaction synchronization";
+
+command prelude():void
+address MASTERprelude
+comment "Prepare the server for the master role. Or remove any leftover log 
files.";
+
+module slave;
+
+command sync():void
+address SLAVEsyncDefault
+comment "Login to master with environment credentials to initiate submission 
of the log records";
+command sync(uri:str):void
+address SLAVEsyncURI
+comment "Login to master with admin credentials to initiate submission of the 
log records";
+command sync(uri:str, usr:str, pw:str, tag:oid):void
+address SLAVEsync
+comment "Login to master uri with admin credentials to initiate submission of 
the log records";
+
+command stop():void
+address SLAVEstop
+comment "Slave suspends synchronisation with master";
_______________________________________________
Checkin-list mailing list
[email protected]
http://mail.monetdb.org/mailman/listinfo/checkin-list

Reply via email to