Re: [MarkLogic Dev General] Bulk updates (xqsync vs. mlcp)

Justin Makeig Wed, 29 Jun 2016 14:59:04 -0700

 the point-in-time feature does not require that we disable merges.  It just 
requires that the merge timestamp is set to the earliest point back in time 
where we want to be able to look back to.

Yes. That's correct. The further you push the merge timestamp back, the more 
your going to stress normal operations, though. Merges allow the database to 
optimize the storage and indexes to support high-performance I/O. They're not a 
nice-to-have, they're a required aspect of how MarkLogic works. The fact that 
you can delay or turn them off is an advanced operation for special cases, such 
as rolling back a database for disaster recovery.

What I am still missing is why the "Inside MarkLogic" document describes how 
MVCC timestamps can be used to implement "Time Travel" and the "Application 
Developer's Guide" describe point-in-time queries if you (assuming that you 
speak for MarkLogic) advise against using them.

Point-in-time queries are good for "micro" time travel, if I may coin a term. 
They're good for maintaining a consistent snapshot over a very short period of 
time, within the finite window that you'd configure to not merge. Beyond that 
window, the history is gone—optimized away. If you need to keep that history 
you should do so explicitly in documents. (That's how the Document Library 
Services and Bitemporal APIs maintain version histories.)

Is the documentation accurate?

Yes, but a little light on why you'd use point-in-time queries and what the 
boundaries and implications are.

Under what circumstances do you recommend using the point-in-time technique 
described in the guide?  Does the point-in-time query technique only work if 
merges are disabled?

Yes, doing anything at a point in time in the past means that you need to 
maintain the (MVCC) state back to that point. The only way the database can do 
that is by not merging out all of the obsolete fragments. This is OK for finite 
windows of time, but the database needs to eventually merge. (You can give the 
merge timestamp a negative value to maintain a rolling window.)

Justin

--
Justin Makeig
Director, Product Management
MarkLogic
[email protected]<mailto:[email protected]>

On Jun 29, 2016, at 12:18 PM, Hans Hübner 
<[email protected]<mailto:[email protected]>> wrote:

Justin,

thank you for the additional documentation pointer.  From what I read, I 
understand that merging is a useful operation and that merges should not be 
disabled.  I can agree to that, but as far as I have understood, the 
point-in-time feature does not require that we disable merges.  It just 
requires that the merge timestamp is set to the earliest point back in time 
where we want to be able to look back to.  Does setting the merge timestamp 
automatically disable the merges?

What I am still missing is why the "Inside MarkLogic" document describes how 
MVCC timestamps can be used to implement "Time Travel" and the "Application 
Developer's Guide" describe point-in-time queries if you (assuming that you 
speak for MarkLogic) advise against using them.  The "Application Developer's 
Guide" in particular describes how such queries work, in detail, and it does 
not mention that one should avoid the technique.

Is the documentation accurate?  Under what circumstances do you recommend using 
the point-in-time technique described in the guide?  Does the point-in-time 
query technique only work if merges are disabled?

Hans

On Wed, Jun 29, 2016 at 7:40 PM, Justin Makeig 
<[email protected]<mailto:[email protected]>> wrote:
Can you elaborate what you mean by "maintain the health of a database"?  If 
we'd decide that we never want to delete any data in a certain MarkLogic 
database so that we can roll back to any point in time, what would be the down 
sides?  How would the database become unhealthy?

Please take a look at the docs on merging, specifically the section, "Merges 
Are Good" <https://docs.marklogic.com/guide/admin/merges#id_43904>. Merging is 
the way that MarkLogic manages its internal data to support efficient and 
consistent ingest and query I/O. It is an internal process and completely 
orthogonal to how you version your documents.

What you describe sounds more like temporal versioning. Please take a look at 
MarkLogic's bitemporal APIs <https://docs.marklogic.com/guide/temporal/intro>. 
With bitemporal management you maintain an immutable copy of the entire history 
of your data that you can query at any point in time. The APIs do all of the 
sophisticated work maintaining versions securely. The "bi" in bitemporal allows 
you to query the valid time of the document (e.g. a trade was effective on 
2016-06-01) as you knew it at any point in time (e.g. the trade wasn't recorded 
until 2016-06-02 and then it was corrected on 2016-06-05).

Justin

On Jun 28, 2016, at 9:55 PM, Hans Hübner 
<[email protected]<mailto:[email protected]>> wrote:

On Tue, Jun 28, 2016 at 10:36 PM, Justin Makeig 
<[email protected]<mailto:[email protected]>> wrote:
> as we want to be able to use the point-in-time query feature to track 
> document changes over time

Point-in-time queries <https://docs.marklogic.com/guide/app-dev/point_in_time> 
are not designed for versioning, as I think you're describing it. The 
timestamps are internal bookkeeping. (Think of them as monotonically increasing 
integers rather than wall clock readings.) Querying at specific timestamp 
relies on _not_ merging deleted fragments. For short windows, like minutes or 
even hours, depending on your workload, this is OK. However, merging is 
necessary and useful to maintain the health of a database.

Can you elaborate what you mean by "maintain the health of a database"?  If 
we'd decide that we never want to delete any data in a certain MarkLogic 
database so that we can roll back to any point in time, what would be the down 
sides?  How would the database become unhealthy?

We have an existing application that makes use of another database system 
(Datomic) exactly in that way, and we would like to carry it over to MarkLogic. 
 The "Inside MarkLogic" document describes point-in-time queries as "Time 
Travel", but what you write seems to say that using timestamps that way would 
be detrimental to the health of the database, so I'd like to learn more before 
we convert.

Thanks!
Hans

--
LambdaWerk GmbH
Oranienburger Straße 87/89
10178 Berlin
Phone: +49 30 555 7335 0
Fax: +49 30 555 7335 99

HRB 169991<tel:169991> B Amtsgericht Charlottenburg
USt-ID: DE301399951
Geschäftsführer:  Hans Hübner

http://lambdawerk.com/

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

--
LambdaWerk GmbH
Oranienburger Straße 87/89
10178 Berlin
Phone: +49 30 555 7335 0
Fax: +49 30 555 7335 99

HRB 169991 B Amtsgericht Charlottenburg
USt-ID: DE301399951
Geschäftsführer:  Hans Hübner

http://lambdawerk.com/

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Bulk updates (xqsync vs. mlcp)

Reply via email to