Re: [MarkLogic Dev General] Bulk updates (xqsync vs. mlcp)

Justin Makeig Thu, 30 Jun 2016 09:30:52 -0700

The amount of data that we're accumulating by keeping the old versions around 
does not bother us.


It will bother you if you never allow the database to merge. If you're keeping 
a small window of history this will work fine. (Though an errant timestamp 
setting by a config script will delete your history.) If you need to keep the 
entire history, you will effectively be disabling merging with this strategy, 
which will certainly land you in trouble. Merges are good; the database needs 
them to optimize its internal data structures to support fast and consistent 
ingest and queries.

Save each version as a separate document. Put all versions of a single document 
in a collection to represent the "logical" document and give each instance 
version a unique URI to represent its version number. You could even create a 
special "latest" collection that contains only the latest version of each 
document. This will allow you to do queries like, "How many versions of 
(logical) document ABC.xml do I have?" "What's the latest version of (logical) 
ABC.xml?" "Run this diff code on latest ABC.xml and its previous version." With 
timestamps you'll have to know _when_ a document was updated in order to get 
its previous version. This will require two steps for every query and won't 
allow you to do any queries across versions, because the older/newer versions 
don't exist, from the perspective of a query that runs at a single timestamp.

Justin

On Jun 29, 2016, at 8:49 PM, Hans Hübner 
<[email protected]<mailto:[email protected]>> wrote:

On Wed, Jun 29, 2016 at 11:29 PM, Danny Sokolsky 
<[email protected]<mailto:[email protected]>> wrote:
It might be tempting to treat point-in-time queries for generic versioning, but 
it is usually not what you want.

Does that help to clarify?

Thanks, Danny, this helps.  In our use case, we have thousands of relatively 
complex trees of nodes, and the configuration of each tree changes over time, 
when new data is inserted into the database.  In order to make old 
configurations of each tree available for inspection, we use the MVCC 
point-in-time rollback feature of our current database system to recover 
previous database states and visualize them.  This is merely a diagnostic 
feature, but given the relative complexity of the connections between the tree 
nodes, it is helpful to be able to visualize the changes to each tree that 
happened when new data was inserted.

The amount of data that we're accumulating by keeping the old versions around 
does not bother us.  This database is a special purpose database tied to a 
particular application, and it won't be used to insert random other documents.  
It thus seems to me that we'll be fine with using the MVCC feature for our 
history visualization for now.  If we decide that the space overhead is 
prohibitive, we can always adjust the merge timestamp, trading off history 
depth against database space used.

It would be helpful to have the tradeoffs that one has to make when using the 
"Time Travel" feature be listed in the documentation.

-Hans

--
LambdaWerk GmbH
Oranienburger Straße 87/89
10178 Berlin
Phone: +49 30 555 7335 0
Fax: +49 30 555 7335 99

HRB 169991<tel:169991> B Amtsgericht Charlottenburg
USt-ID: DE301399951
Geschäftsführer:  Hans Hübner

http://lambdawerk.com/


_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Bulk updates (xqsync vs. mlcp)

Reply via email to