Hi Michael,

Thanks for your reply. I guess I am still missing something as it is not clear 
to me how encoding both the id and version into the file URI would help me? I 
could understand if I was just encoding the id as that will not change over 
time, however for each request we are potentially writing a different version.

Example 1
========
For example, if I follow your suggestion of encoding the id and version into 
the document URI. Let us assume that a document already exists in the database 
with id=1234 and version=1, therefore the URI is /1234-v1.xml:

XQuery Thread
-------------------
0) Set transaction mode to 'updating'
1) XQuery REST Endpoint - receives an XML document over HTTP POST which is 
id=1234
2) Searches the database for an existing document, which contains an id element 
with value 1234, and a version element with value 2. It finds the document 
/1234-v1.xml. (I assume this causes the query transaction to take a READ lock 
on the document URI /1234-v1.xml?)
3) It compares the version of the two documents. The posted document has a new 
version so it continues.
4) Removes the collection 'live' from the document /1234-v1.xml. (Does this 
take a WRITE lock on /1234-v1.xml?)
5) Inserts the posted content into the document /1234-v2.xml into the database 
and add's it to the 'live' collection. (Does this take a WRITE lock on 
/1234-v2.xml?)
6) Call xdmp:commit (Presumably all READ and WRITE locks are released here?)

If I have more than one of these threads executing in parallel, it seems to me 
that through thread pre-emption it is still possible for more than one thread 
to get to at least complete to  the end of (3) before any sort of lock 
contention occurs. Imagining there are just two threads in parallel for the 
moment, I think that means that when the first thread to acquire the lock 
releases the lock in (6), then the other thread will continue through (4) - 
(6), is that correct? If so that leads to a different class of errors: a) if 
both posted documents that initiated the threads both have version=2, then yes 
I cannot generate a duplicate in the database, as the second thread to complete 
with overwrite the v2 document of the first thread, but which was meant to be 
the correct v2? b) If both posted documents have different versions but greater 
than version=1, then I  may end up with both version=2 and version=3 documents 
in the live collection.

If I understand you correctly and my assumptions above are correct, then to a) 
prevent inserting the same version and id, and to b) also prevent inserting the 
same id and different versions, we would need to re-design our document URI 
scheme to *just* include the id of the document and *not* the version. Is that 
correct?

As you suggested I was considering using xdmp:lock-for-update. Introducing this 
between steps (1) and steps(2) of the above and taking the lock on the id of 
our record (i.e. ignoring the version) does indeed seem to fix our issues. 
Thank you very much for your guidance Mike. If you have any comments or 
clarifications on what I have written and my assumptions, I would be glad to 
hear from you further...


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: 30 June 2014 18:47
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Locking and Transactions in REST 
read+update

MarkLogic automatically locks document URIs as necessary. The goal is to design 
your document URIs to enforce whatever constraints you need.

The best way to avoid a conflict is to build the version into the document URI, 
as well as having it in the XML. If your URI is something like 
/{$id}/{$version} then concurrent attempts to insert the same id and version 
will try to lock the same URI. One of them will win, and the other will retry. 
This also means step #2 in your process is as simple as exists(doc($uri)) - but 
not xdmp:exists, because that function won't read-lock the URI.

If for some reason you can't build the id and version into your URIs, fake it 
with an intent lock. Use whatever real URI you like, but in the same insert 
code construct a fake URI with the id and version, and call 
https://docs.marklogic.com/xdmp:lock-for-update to lock that fake URI 
explicitly. Again any concurrent requests will have to resolve the conflict, 
and one will win. You'll still have to check for existing versions in step #2, 
but at least you'll have a write lock on the id and version.

Note that conflict resolution can be bad for performance. It's best to design 
your ingestion process such that conflicts will be rare. Having that step #2 
helps, but this is another reason to prefer a real id-version URI over an 
intent lock.

-- Mike

On 30 Jun 2014, at 09:33 , Retter, Adam (RBI-UK) <[email protected]> wrote:

> We have what I consider to be an interesting issue with an XQuery that is run 
> as a REST endpoint, basically we have at least two race-conditions that we 
> have identified. Typically I would fix these by enforcing something like a 
> Critical Section in the code through appropriate locking. 
> Unfortunately after lots of head scratching and re-reading of documentation I 
> cannot at the moment see how to solve this with the facilities provided in 
> MarkLogic and am looking for some guidance. I guess this is a common issue 
> that others must have solved before, so I am most likely missing something 
> obvious!
> 
> Our REST endpoint effectively does the following:
> 
> 1) XQuery REST Endpoint - receives an XML document over HTTP POST. Let's call 
> this document B.
> 2) Searches the database for an existing document, which has an <id> element 
> with the same value as that in document B. Assuming we find a document, let 
> us call that document A.
> 3) Check the version of document B against document A. The version is 
> indicated in a <version> element in each document respectively. The version 
> of document B should be newer than document A, if not then stop, else 
> continue.
> 4) Remove document A from the 'live' collection
> 5) Insert document B into the database and add it to the 'live' collection.
> 
> Now this REST end-point may be called by many clients in parallel, which 
> means not just adding the new document B, but in parallel running the above 
> query for document C, D, E ... nN. I think we are seeing three separate race 
> conditions appearing:
> 
> i) Steps (4) and (5) where the same version of the document with the same id 
> can be inserted into the live collection. Typically step (4) tries to ensure 
> there is only one live version by removing the old document (document A) from 
> the live collection, before adding the new document (document B) to the live 
> collection.
> 
> ii) Steps (3) and (5) where multiple versions can be inserted into the live 
> collection.
> 
> iii) Steps (3) and (5) where sometimes an older version is inserted after a 
> newer version.
> 
> I believe that due to the number of client requests, we are effectively 
> seeing threads pre-empt other threads within this query and because no 
> explicit locking has yet been added to the system, we have problems.
> 
> How can I make the steps (1) through (5) thread-safe?
> 
> I have tried adding xdmp:transaction-mode "update"; to my REST query, and 
> using an explicit xdmp:commit at the end. This has not helped at all, but I 
> think that is because we are never writing the same document, every document 
> we write in steps (4) and (5) will always have a different URI in the 
> database. I think really that we need to be able to lock based on an abstract 
> uri (e.g. the content of our id element) and not the document uri as that 
> varies over time in our model.
> 
> I also looked at xdmp:lock-acquire, but it appears the locks are shared for a 
> single user, i.e. it states - "When a user locks a URI, it is locked to other 
> users, but not to the user who locked it", the problem I have here is that 
> this is a public un-authenticated REST end-point effectively so it will 
> always be the same user running the query as far as ML is concerned.
> 
> Does anyone have any suggestions of how we might achieve what we are looking 
> for?
> 
> Cheers Adam.
> 
> DISCLAIMER
> This message is intended only for the use of the person(s) ("Intended 
> Recipient") to whom it is addressed. It may contain information, which is 
> privileged and confidential. Accordingly any dissemination, distribution, 
> copying or other use of this message or any of its content by any person 
> other than the Intended Recipient may constitute a breach of civil or 
> criminal law and is strictly prohibited. If you are not the Intended 
> Recipient, please contact the sender as soon as possible.
> Reed Business Information Limited. Registered Office: Quadrant House, The 
> Quadrant, Sutton, Surrey, SM2 5AS, UK. 
> Registered in England under Company No. 151537
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to