MarkLogic automatically locks document URIs as necessary. The goal is to design
your document URIs to enforce whatever constraints you need.
The best way to avoid a conflict is to build the version into the document URI,
as well as having it in the XML. If your URI is something like
/{$id}/{$version} then concurrent attempts to insert the same id and version
will try to lock the same URI. One of them will win, and the other will retry.
This also means step #2 in your process is as simple as exists(doc($uri)) - but
not xdmp:exists, because that function won't read-lock the URI.
If for some reason you can't build the id and version into your URIs, fake it
with an intent lock. Use whatever real URI you like, but in the same insert
code construct a fake URI with the id and version, and call
https://docs.marklogic.com/xdmp:lock-for-update to lock that fake URI
explicitly. Again any concurrent requests will have to resolve the conflict,
and one will win. You'll still have to check for existing versions in step #2,
but at least you'll have a write lock on the id and version.
Note that conflict resolution can be bad for performance. It's best to design
your ingestion process such that conflicts will be rare. Having that step #2
helps, but this is another reason to prefer a real id-version URI over an
intent lock.
-- Mike
On 30 Jun 2014, at 09:33 , Retter, Adam (RBI-UK) <[email protected]> wrote:
> We have what I consider to be an interesting issue with an XQuery that is run
> as a REST endpoint, basically we have at least two race-conditions that we
> have identified. Typically I would fix these by enforcing something like a
> Critical Section in the code through appropriate locking.
> Unfortunately after lots of head scratching and re-reading of documentation I
> cannot at the moment see how to solve this with the facilities provided in
> MarkLogic and am looking for some guidance. I guess this is a common issue
> that others must have solved before, so I am most likely missing something
> obvious!
>
> Our REST endpoint effectively does the following:
>
> 1) XQuery REST Endpoint - receives an XML document over HTTP POST. Let's call
> this document B.
> 2) Searches the database for an existing document, which has an <id> element
> with the same value as that in document B. Assuming we find a document, let
> us call that document A.
> 3) Check the version of document B against document A. The version is
> indicated in a <version> element in each document respectively. The version
> of document B should be newer than document A, if not then stop, else
> continue.
> 4) Remove document A from the 'live' collection
> 5) Insert document B into the database and add it to the 'live' collection.
>
> Now this REST end-point may be called by many clients in parallel, which
> means not just adding the new document B, but in parallel running the above
> query for document C, D, E ... nN. I think we are seeing three separate race
> conditions appearing:
>
> i) Steps (4) and (5) where the same version of the document with the same id
> can be inserted into the live collection. Typically step (4) tries to ensure
> there is only one live version by removing the old document (document A) from
> the live collection, before adding the new document (document B) to the live
> collection.
>
> ii) Steps (3) and (5) where multiple versions can be inserted into the live
> collection.
>
> iii) Steps (3) and (5) where sometimes an older version is inserted after a
> newer version.
>
> I believe that due to the number of client requests, we are effectively
> seeing threads pre-empt other threads within this query and because no
> explicit locking has yet been added to the system, we have problems.
>
> How can I make the steps (1) through (5) thread-safe?
>
> I have tried adding xdmp:transaction-mode "update"; to my REST query, and
> using an explicit xdmp:commit at the end. This has not helped at all, but I
> think that is because we are never writing the same document, every document
> we write in steps (4) and (5) will always have a different URI in the
> database. I think really that we need to be able to lock based on an abstract
> uri (e.g. the content of our id element) and not the document uri as that
> varies over time in our model.
>
> I also looked at xdmp:lock-acquire, but it appears the locks are shared for a
> single user, i.e. it states - "When a user locks a URI, it is locked to other
> users, but not to the user who locked it", the problem I have here is that
> this is a public un-authenticated REST end-point effectively so it will
> always be the same user running the query as far as ML is concerned.
>
> Does anyone have any suggestions of how we might achieve what we are looking
> for?
>
> Cheers Adam.
>
> DISCLAIMER
> This message is intended only for the use of the person(s) ("Intended
> Recipient") to whom it is addressed. It may contain information, which is
> privileged and confidential. Accordingly any dissemination, distribution,
> copying or other use of this message or any of its content by any person
> other than the Intended Recipient may constitute a breach of civil or
> criminal law and is strictly prohibited. If you are not the Intended
> Recipient, please contact the sender as soon as possible.
> Reed Business Information Limited. Registered Office: Quadrant House, The
> Quadrant, Sutton, Surrey, SM2 5AS, UK.
> Registered in England under Company No. 151537
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general