Just to wrap up this thread, I've incorrectly and unhelpfully conflated two 
aspects of merging: consolidating stands and getting rid of obsolete fragments. 
When you set a merge timestamp you are only affecting the latter. Regardless of 
the timestamp the database will always consolidate stands for you. Again, the 
docs have very good coverage of this 
<https://docs.marklogic.com/guide/admin/merges>. (Hat tip, Jason Hunter and 
Danny Sokolsky.)

Regardless, I still stand by the recommendation to _not_ use MVCC timestamps as 
general-purpose versioning, mostly for the difficulty in querying and the 
potential to screw something up administratively.

Sorry for the confusion. 

Justin

--
Justin Makeig
Director, Product Management
MarkLogic
[email protected]

> On Jun 30, 2016, at 11:02 AM, Hans Hübner <[email protected]> wrote:
> 
> On Thu, Jun 30, 2016 at 6:30 PM, Justin Makeig <[email protected] 
> <mailto:[email protected]>> wrote:
>> The amount of data that we're accumulating by keeping the old versions 
>> around does not bother us.  
> 
> It will bother you if you never allow the database to merge. If you're 
> keeping a small window of history this will work fine. (Though an errant 
> timestamp setting by a config script will delete your history.) If you need 
> to keep the entire history, you will effectively be disabling merging with 
> this strategy, which will certainly land you in trouble. Merges are good; the 
> database needs them to optimize its internal data structures to support fast 
> and consistent ingest and queries.
> 
> Save each version as a separate document. Put all versions of a single 
> document in a collection to represent the "logical" document and give each 
> instance version a unique URI to represent its version number. You could even 
> create a special "latest" collection that contains only the latest version of 
> each document. This will allow you to do queries like, "How many versions of 
> (logical) document ABC.xml do I have?" "What's the latest version of 
> (logical) ABC.xml?" "Run this diff code on latest ABC.xml and its previous 
> version." With timestamps you'll have to know _when_ a document was updated 
> in order to get its previous version. This will require two steps for every 
> query and won't allow you to do any queries across versions, because the 
> older/newer versions don't exist, from the perspective of a query that runs 
> at a single timestamp.  
> 
> Thank you for the concrete architectural advice!  It does not seem to be very 
> bothersome to follow that route, so we will certainly trust you in that it is 
> better than using MVCC timestamps.
> 
> Let me suggest again that the "Time Travel" section in the "Inside Marklogic" 
> document and the section on point-in-time queries in the "Application 
> Developers Guide" be updated to include information on the caveats that you 
> and your colleagues have expressed.  I'm still a bit puzzled by the vehemence 
> that you all put forth into discouraging us from using it.  Are there any 
> other advertised features that can affect the health of a database in a 
> similar way and should thus be avoided?
> 
> Thanks!
> Hans
> 
> -- 
> LambdaWerk GmbH
> Oranienburger Straße 87/89
> 10178 Berlin
> Phone: +49 30 555 7335 0
> Fax: +49 30 555 7335 99
> 
> HRB 169991 B Amtsgericht Charlottenburg
> USt-ID: DE301399951
> Geschäftsführer:  Hans Hübner
> 
> http://lambdawerk.com/ <http://lambdawerk.com/>
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to