You are correct, Hans, setting a merge timestamp does not disable merges. The downsides to never getting rid of deleted fragments is that your database can grow without bound, without sensible ways to manage it. Point-in-time queries are really meant for relatively short durations. Some of the consequences of keeping all old versions are:
· Relevance: relevance is calculated based on all fragments in the database, so if, for example, you happened to have 1,000,000 versions of a particular document due to a bug you had in your application code that kept updating the same document (or for whatever reason), that would probably make things in less relevant than it would otherwise. · Manageablitly: there is no way to manage the old versions; they are all always there. · Size: your database might get very large. Point-in-time queries are very useful for things like: · Pagination: if you have a requirement that many pages of search results give the exact same answers for a relatively period of time (for example, an hour, or a day), you can keep the last day around and query those at a point in time. · Publishing a new version of documents: If you want to load a new version of documents (say a magazine or similar) and test it in your production system while still having the old version be production, you can set the merge timestamp, make the users of the old version query at a point in time, load the new versions, and test the new stuff at the current timestamp. There are lots of other ways to do this, but point in time is one way. It might be tempting to treat point-in-time queries for generic versioning, but it is usually not what you want. Does that help to clarify? -Danny From: [email protected] [mailto:[email protected]] On Behalf Of Hans Hübner Sent: Wednesday, June 29, 2016 12:19 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Bulk updates (xqsync vs. mlcp) Justin, thank you for the additional documentation pointer. From what I read, I understand that merging is a useful operation and that merges should not be disabled. I can agree to that, but as far as I have understood, the point-in-time feature does not require that we disable merges. It just requires that the merge timestamp is set to the earliest point back in time where we want to be able to look back to. Does setting the merge timestamp automatically disable the merges? What I am still missing is why the "Inside MarkLogic" document describes how MVCC timestamps can be used to implement "Time Travel" and the "Application Developer's Guide" describe point-in-time queries if you (assuming that you speak for MarkLogic) advise against using them. The "Application Developer's Guide" in particular describes how such queries work, in detail, and it does not mention that one should avoid the technique. Is the documentation accurate? Under what circumstances do you recommend using the point-in-time technique described in the guide? Does the point-in-time query technique only work if merges are disabled? Hans On Wed, Jun 29, 2016 at 7:40 PM, Justin Makeig <[email protected]<mailto:[email protected]>> wrote: Can you elaborate what you mean by "maintain the health of a database"? If we'd decide that we never want to delete any data in a certain MarkLogic database so that we can roll back to any point in time, what would be the down sides? How would the database become unhealthy? Please take a look at the docs on merging, specifically the section, "Merges Are Good" <https://docs.marklogic.com/guide/admin/merges#id_43904>. Merging is the way that MarkLogic manages its internal data to support efficient and consistent ingest and query I/O. It is an internal process and completely orthogonal to how you version your documents. What you describe sounds more like temporal versioning. Please take a look at MarkLogic's bitemporal APIs <https://docs.marklogic.com/guide/temporal/intro>. With bitemporal management you maintain an immutable copy of the entire history of your data that you can query at any point in time. The APIs do all of the sophisticated work maintaining versions securely. The "bi" in bitemporal allows you to query the valid time of the document (e.g. a trade was effective on 2016-06-01) as you knew it at any point in time (e.g. the trade wasn't recorded until 2016-06-02 and then it was corrected on 2016-06-05). Justin On Jun 28, 2016, at 9:55 PM, Hans Hübner <[email protected]<mailto:[email protected]>> wrote: On Tue, Jun 28, 2016 at 10:36 PM, Justin Makeig <[email protected]<mailto:[email protected]>> wrote: > as we want to be able to use the point-in-time query feature to track > document changes over time Point-in-time queries <https://docs.marklogic.com/guide/app-dev/point_in_time> are not designed for versioning, as I think you're describing it. The timestamps are internal bookkeeping. (Think of them as monotonically increasing integers rather than wall clock readings.) Querying at specific timestamp relies on _not_ merging deleted fragments. For short windows, like minutes or even hours, depending on your workload, this is OK. However, merging is necessary and useful to maintain the health of a database. Can you elaborate what you mean by "maintain the health of a database"? If we'd decide that we never want to delete any data in a certain MarkLogic database so that we can roll back to any point in time, what would be the down sides? How would the database become unhealthy? We have an existing application that makes use of another database system (Datomic) exactly in that way, and we would like to carry it over to MarkLogic. The "Inside MarkLogic" document describes point-in-time queries as "Time Travel", but what you write seems to say that using timestamps that way would be detrimental to the health of the database, so I'd like to learn more before we convert. Thanks! Hans -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991<tel:169991> B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/ _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991 B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
