Just to wrap up this thread, I've incorrectly and unhelpfully conflated two aspects of merging: consolidating stands and getting rid of obsolete fragments. When you set a merge timestamp you are only affecting the latter. Regardless of the timestamp the database will always consolidate stands for you. Again, the docs have very good coverage of this <https://docs.marklogic.com/guide/admin/merges>. (Hat tip, Jason Hunter and Danny Sokolsky.)
Regardless, I still stand by the recommendation to _not_ use MVCC timestamps as general-purpose versioning, mostly for the difficulty in querying and the potential to screw something up administratively. Sorry for the confusion. Justin -- Justin Makeig Director, Product Management MarkLogic [email protected] > On Jun 30, 2016, at 11:02 AM, Hans Hübner <[email protected]> wrote: > > On Thu, Jun 30, 2016 at 6:30 PM, Justin Makeig <[email protected] > <mailto:[email protected]>> wrote: >> The amount of data that we're accumulating by keeping the old versions >> around does not bother us. > > It will bother you if you never allow the database to merge. If you're > keeping a small window of history this will work fine. (Though an errant > timestamp setting by a config script will delete your history.) If you need > to keep the entire history, you will effectively be disabling merging with > this strategy, which will certainly land you in trouble. Merges are good; the > database needs them to optimize its internal data structures to support fast > and consistent ingest and queries. > > Save each version as a separate document. Put all versions of a single > document in a collection to represent the "logical" document and give each > instance version a unique URI to represent its version number. You could even > create a special "latest" collection that contains only the latest version of > each document. This will allow you to do queries like, "How many versions of > (logical) document ABC.xml do I have?" "What's the latest version of > (logical) ABC.xml?" "Run this diff code on latest ABC.xml and its previous > version." With timestamps you'll have to know _when_ a document was updated > in order to get its previous version. This will require two steps for every > query and won't allow you to do any queries across versions, because the > older/newer versions don't exist, from the perspective of a query that runs > at a single timestamp. > > Thank you for the concrete architectural advice! It does not seem to be very > bothersome to follow that route, so we will certainly trust you in that it is > better than using MVCC timestamps. > > Let me suggest again that the "Time Travel" section in the "Inside Marklogic" > document and the section on point-in-time queries in the "Application > Developers Guide" be updated to include information on the caveats that you > and your colleagues have expressed. I'm still a bit puzzled by the vehemence > that you all put forth into discouraging us from using it. Are there any > other advertised features that can affect the health of a database in a > similar way and should thus be avoided? > > Thanks! > Hans > > -- > LambdaWerk GmbH > Oranienburger Straße 87/89 > 10178 Berlin > Phone: +49 30 555 7335 0 > Fax: +49 30 555 7335 99 > > HRB 169991 B Amtsgericht Charlottenburg > USt-ID: DE301399951 > Geschäftsführer: Hans Hübner > > http://lambdawerk.com/ <http://lambdawerk.com/> > > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
