Yes, steps (2, 4, 5) will take locks as you described. Any number of concurrent
requests could get to step #3 before the write locks have to be taken.
I believe URIs like {$id}-{$version} would be safe because any threads that
don't get the write locks will restart from the beginning, rather than resuming
at step #4 or #5. By restarting they would see the latest updates.
However I think you're right that it's better to use {$id}-{$version} in
archived versions and {$id} alone in the live version. That makes the update
less dependent on versioning logic. Perhaps more importantly it optimizes
getting the latest version by id: doc($uri) instead of a search by id and
collection. Adding a new version means overwriting {$id}-latest and copying the
old XML to {$id}-{$previous-version}. However I think update speed will remain
about the same, because you're already doing two updates: removing the 'live'
collection from {$id}-{$previous-version} and writing to
{$id}-{$current-version}.
Better still, use something like {$id}/{$version} and {$id}/current. That gives
you the advantages described above, and also gives you the ability to query all
versions for an id using cts:directory-query.
-- Mike
On 1 Jul 2014, at 07:53 , Retter, Adam (RBI-UK) <[email protected]> wrote:
> Hi Michael,
>
> Thanks for your reply. I guess I am still missing something as it is not
> clear to me how encoding both the id and version into the file URI would help
> me? I could understand if I was just encoding the id as that will not change
> over time, however for each request we are potentially writing a different
> version.
>
> Example 1
> ========
> For example, if I follow your suggestion of encoding the id and version into
> the document URI. Let us assume that a document already exists in the
> database with id=1234 and version=1, therefore the URI is /1234-v1.xml:
>
> XQuery Thread
> -------------------
> 0) Set transaction mode to 'updating'
> 1) XQuery REST Endpoint - receives an XML document over HTTP POST which is
> id=1234
> 2) Searches the database for an existing document, which contains an id
> element with value 1234, and a version element with value 2. It finds the
> document /1234-v1.xml. (I assume this causes the query transaction to take a
> READ lock on the document URI /1234-v1.xml?)
> 3) It compares the version of the two documents. The posted document has a
> new version so it continues.
> 4) Removes the collection 'live' from the document /1234-v1.xml. (Does this
> take a WRITE lock on /1234-v1.xml?)
> 5) Inserts the posted content into the document /1234-v2.xml into the
> database and add's it to the 'live' collection. (Does this take a WRITE lock
> on /1234-v2.xml?)
> 6) Call xdmp:commit (Presumably all READ and WRITE locks are released here?)
>
> If I have more than one of these threads executing in parallel, it seems to
> me that through thread pre-emption it is still possible for more than one
> thread to get to at least complete to the end of (3) before any sort of lock
> contention occurs. Imagining there are just two threads in parallel for the
> moment, I think that means that when the first thread to acquire the lock
> releases the lock in (6), then the other thread will continue through (4) -
> (6), is that correct? If so that leads to a different class of errors: a) if
> both posted documents that initiated the threads both have version=2, then
> yes I cannot generate a duplicate in the database, as the second thread to
> complete with overwrite the v2 document of the first thread, but which was
> meant to be the correct v2? b) If both posted documents have different
> versions but greater than version=1, then I may end up with both version=2
> and version=3 documents in the live collection.
>
> If I understand you correctly and my assumptions above are correct, then to
> a) prevent inserting the same version and id, and to b) also prevent
> inserting the same id and different versions, we would need to re-design our
> document URI scheme to *just* include the id of the document and *not* the
> version. Is that correct?
>
> As you suggested I was considering using xdmp:lock-for-update. Introducing
> this between steps (1) and steps(2) of the above and taking the lock on the
> id of our record (i.e. ignoring the version) does indeed seem to fix our
> issues. Thank you very much for your guidance Mike. If you have any comments
> or clarifications on what I have written and my assumptions, I would be glad
> to hear from you further...
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: 30 June 2014 18:47
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Locking and Transactions in REST
> read+update
>
> MarkLogic automatically locks document URIs as necessary. The goal is to
> design your document URIs to enforce whatever constraints you need.
>
> The best way to avoid a conflict is to build the version into the document
> URI, as well as having it in the XML. If your URI is something like
> /{$id}/{$version} then concurrent attempts to insert the same id and version
> will try to lock the same URI. One of them will win, and the other will
> retry. This also means step #2 in your process is as simple as
> exists(doc($uri)) - but not xdmp:exists, because that function won't
> read-lock the URI.
>
> If for some reason you can't build the id and version into your URIs, fake it
> with an intent lock. Use whatever real URI you like, but in the same insert
> code construct a fake URI with the id and version, and call
> https://docs.marklogic.com/xdmp:lock-for-update to lock that fake URI
> explicitly. Again any concurrent requests will have to resolve the conflict,
> and one will win. You'll still have to check for existing versions in step
> #2, but at least you'll have a write lock on the id and version.
>
> Note that conflict resolution can be bad for performance. It's best to design
> your ingestion process such that conflicts will be rare. Having that step #2
> helps, but this is another reason to prefer a real id-version URI over an
> intent lock.
>
> -- Mike
>
> On 30 Jun 2014, at 09:33 , Retter, Adam (RBI-UK) <[email protected]>
> wrote:
>
>> We have what I consider to be an interesting issue with an XQuery that is
>> run as a REST endpoint, basically we have at least two race-conditions that
>> we have identified. Typically I would fix these by enforcing something like
>> a Critical Section in the code through appropriate locking.
>> Unfortunately after lots of head scratching and re-reading of documentation
>> I cannot at the moment see how to solve this with the facilities provided in
>> MarkLogic and am looking for some guidance. I guess this is a common issue
>> that others must have solved before, so I am most likely missing something
>> obvious!
>>
>> Our REST endpoint effectively does the following:
>>
>> 1) XQuery REST Endpoint - receives an XML document over HTTP POST. Let's
>> call this document B.
>> 2) Searches the database for an existing document, which has an <id> element
>> with the same value as that in document B. Assuming we find a document, let
>> us call that document A.
>> 3) Check the version of document B against document A. The version is
>> indicated in a <version> element in each document respectively. The version
>> of document B should be newer than document A, if not then stop, else
>> continue.
>> 4) Remove document A from the 'live' collection
>> 5) Insert document B into the database and add it to the 'live' collection.
>>
>> Now this REST end-point may be called by many clients in parallel, which
>> means not just adding the new document B, but in parallel running the above
>> query for document C, D, E ... nN. I think we are seeing three separate race
>> conditions appearing:
>>
>> i) Steps (4) and (5) where the same version of the document with the same id
>> can be inserted into the live collection. Typically step (4) tries to ensure
>> there is only one live version by removing the old document (document A)
>> from the live collection, before adding the new document (document B) to the
>> live collection.
>>
>> ii) Steps (3) and (5) where multiple versions can be inserted into the live
>> collection.
>>
>> iii) Steps (3) and (5) where sometimes an older version is inserted after a
>> newer version.
>>
>> I believe that due to the number of client requests, we are effectively
>> seeing threads pre-empt other threads within this query and because no
>> explicit locking has yet been added to the system, we have problems.
>>
>> How can I make the steps (1) through (5) thread-safe?
>>
>> I have tried adding xdmp:transaction-mode "update"; to my REST query, and
>> using an explicit xdmp:commit at the end. This has not helped at all, but I
>> think that is because we are never writing the same document, every document
>> we write in steps (4) and (5) will always have a different URI in the
>> database. I think really that we need to be able to lock based on an
>> abstract uri (e.g. the content of our id element) and not the document uri
>> as that varies over time in our model.
>>
>> I also looked at xdmp:lock-acquire, but it appears the locks are shared for
>> a single user, i.e. it states - "When a user locks a URI, it is locked to
>> other users, but not to the user who locked it", the problem I have here is
>> that this is a public un-authenticated REST end-point effectively so it will
>> always be the same user running the query as far as ML is concerned.
>>
>> Does anyone have any suggestions of how we might achieve what we are looking
>> for?
>>
>> Cheers Adam.
>>
>> DISCLAIMER
>> This message is intended only for the use of the person(s) ("Intended
>> Recipient") to whom it is addressed. It may contain information, which is
>> privileged and confidential. Accordingly any dissemination, distribution,
>> copying or other use of this message or any of its content by any person
>> other than the Intended Recipient may constitute a breach of civil or
>> criminal law and is strictly prohibited. If you are not the Intended
>> Recipient, please contact the sender as soon as possible.
>> Reed Business Information Limited. Registered Office: Quadrant House, The
>> Quadrant, Sutton, Surrey, SM2 5AS, UK.
>> Registered in England under Company No. 151537
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general