Re: [MarkLogic Dev General] Rsync-Like DB Contents Comparison and Update?

David Gorbet Mon, 29 Jan 2018 10:32:38 -0800

..actually if it’s been disconnected too long (i.e. the journals no longer 
exist), it does a bulk replication which is not necessarily a full copy. It’s a 
delta copy of what’s missing.

Note though that with Database replication you’re subject to replication lag 
limits that you may need to monitor. Also, adding and removing replicas is a 
configuration change that may be more heavyweight than adding a flexrep target.

From: [email protected] 
[mailto:[email protected]] On Behalf Of Ron Hitchens
Sent: Monday, January 29, 2018 7:32 AM
To: MarkLogic Developer Discussion <[email protected]>
Subject: Re: [MarkLogic Dev General] Rsync-Like DB Contents Comparison and 
Update?

   I would suggest looking at direct, low-level database replication if your 
copies can be read-only and your goal is exact duplicate databases.  In this 
case MarkLogic keeps the databases in sync by sending low-level journal frames 
rather than syncing individual documents.  If a slave is disconnected for a 
while it will quickly catch up as the master sends the frames it’s missed.  If 
it’s been disconnected too long, or is newly connected, a zero day full copy is 
sent (depending on your bandwidth, 3GB is not a lot of data to send).

https://docs.marklogic.com/guide/database-replication

----
Ron Hitchens [email protected]<mailto:[email protected]>, +44 7879 358212

On January 27, 2018 at 6:13:12 PM, Eliot Kimber 
([email protected]<mailto:[email protected]>) wrote:
ML 9

I have a system of servers where a master server gets new remote servers 
allocated it more or less randomly and dynamically.

The remote servers need to have a correct copy of a databse on the master 
server but the database is pretty big (the previously-mentioned 380K doc, 3GB 
database).

I can of course sync it with FlexRep but when a new server comes available I 
don't know what the current state of its local copy of the database is (if it 
has one at all) so I'm forced to recreate my master server's replication 
targets and do a full push, which takes an hour or two.

In the case where the remote server already has a copy of the database I would 
like to be able to compare it's contents to the master's and determine what the 
deltas are, if any, and only handle those, which usually would only be a few 
docs out of the total set.

Does there exist this kind of rsync or git-like comparison mechanism, either 
out of the box or as a public project?

I'm thinking of something comparable to what git does, which is create hashes 
of each file and then comparing hashes.

I could do this in XQuery but I suspect something more efficient could be done 
at the forest level, if one knew what one was doing.

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Rsync-Like DB Contents Comparison and Update?

Reply via email to