The amount of data that we're accumulating by keeping the old versions around does not bother us.
It will bother you if you never allow the database to merge. If you're keeping a small window of history this will work fine. (Though an errant timestamp setting by a config script will delete your history.) If you need to keep the entire history, you will effectively be disabling merging with this strategy, which will certainly land you in trouble. Merges are good; the database needs them to optimize its internal data structures to support fast and consistent ingest and queries. Save each version as a separate document. Put all versions of a single document in a collection to represent the "logical" document and give each instance version a unique URI to represent its version number. You could even create a special "latest" collection that contains only the latest version of each document. This will allow you to do queries like, "How many versions of (logical) document ABC.xml do I have?" "What's the latest version of (logical) ABC.xml?" "Run this diff code on latest ABC.xml and its previous version." With timestamps you'll have to know _when_ a document was updated in order to get its previous version. This will require two steps for every query and won't allow you to do any queries across versions, because the older/newer versions don't exist, from the perspective of a query that runs at a single timestamp. Justin On Jun 29, 2016, at 8:49 PM, Hans Hübner <[email protected]<mailto:[email protected]>> wrote: On Wed, Jun 29, 2016 at 11:29 PM, Danny Sokolsky <[email protected]<mailto:[email protected]>> wrote: It might be tempting to treat point-in-time queries for generic versioning, but it is usually not what you want. Does that help to clarify? Thanks, Danny, this helps. In our use case, we have thousands of relatively complex trees of nodes, and the configuration of each tree changes over time, when new data is inserted into the database. In order to make old configurations of each tree available for inspection, we use the MVCC point-in-time rollback feature of our current database system to recover previous database states and visualize them. This is merely a diagnostic feature, but given the relative complexity of the connections between the tree nodes, it is helpful to be able to visualize the changes to each tree that happened when new data was inserted. The amount of data that we're accumulating by keeping the old versions around does not bother us. This database is a special purpose database tied to a particular application, and it won't be used to insert random other documents. It thus seems to me that we'll be fine with using the MVCC feature for our history visualization for now. If we decide that the space overhead is prohibitive, we can always adjust the merge timestamp, trading off history depth against database space used. It would be helpful to have the tradeoffs that one has to make when using the "Time Travel" feature be listed in the documentation. -Hans -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991<tel:169991> B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/ _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
