Thanks Wayne.

---
Ron Hitchens {[email protected]}  +44 7879 358212

On Jun 4, 2014, at 11:12 PM, Wayne Feick <[email protected]> wrote:

> Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this. I'll 
> add a note indicating your interest as well.
> 
> Wayne.
> 
> 
> On 06/04/2014 03:00 PM, Ron Hitchens wrote:
>> 
>> Wayne,
>> 
>>    Thanks for this.  It's a useful code pattern for this sort of thing and I 
>> will probably use it for the specific requirement I have at the moment (I 
>> was planning to do something similar anyway).
>> 
>>    But this code, or any user-level code, does not fully implement the 
>> uniqueness guarantee I'd like to have and that I think a specialized range 
>> index could easily provide.  This will work, but as you say it would be 
>> necessary to always use this code convention.  It would not prevent creation 
>> of duplicate values by code that doesn't follow the convention.  If 
>> uniqueness were enforced by the index, then I could be confident that 
>> uniqueness is absolutely guaranteed and I don't need to trust anyone 
>> (including my future self) to always follow the same locking protocol.
>> 
>> ---
>> Ron Hitchens {[email protected]}  +44 7879 358212
>> 
>> On Jun 4, 2014, at 9:19 PM, Wayne Feick <[email protected]> wrote:
>> 
>>> The simplest is to have the document URI correspond to the element value, 
>>> and if you can use a random value it's good for concurrency.
>>> 
>>> If you can't do that, but you want to ensure only one document can have a 
>>> particular value for an element, I think it's pretty easy using 
>>> xdmp:lock-for-update() on an URI that corresponds to the element value. You 
>>> don't actually need to create a document at that URI, just use it to 
>>> serialize transactions. Here's one way to do it.
>>> declare function lock-element-value($qn as xs:QName, $v as item)
>>> {
>>>   xdmp:lock-for-update(
>>>     "http://acme.com/";
>>>     || xdmp:hash64(fn:namespace-uri-from-QName($qn))
>>>     || "/"
>>>     || xdmp:hash64(fn:localname-from-QName($qn)))
>>> };
>>> You'd then do something like the following.
>>> let $lock := lock-element-value($qn, $v)
>>> let $existing := cts:search(fn:collection(), cts:element-range-query($qn, 
>>> "=", $v, "unfiltered"))
>>> return
>>>   if (fn:exists($existing))
>>>   then ... do whatever you need to do with the existing document
>>>   else ... create a new document, safe from a race with another transaction
>>> You'd want to use lock-element-value() in any updates that could affect a 
>>> change in the element value (insert, update, delete). I think you could get 
>>> away with ignoring deletes since those would automatically serialize with 
>>> any transaction that would modify the existing document.
>>> 
>>> We use this sort of pattern internally to ensure uniqueness of IDs.
>>> 
>>> Wayne.
>>> 
>>> 
>>> On 06/04/2014 12:49 PM, Whitby, Rob wrote:
>>>> I thought 2 simultaneous transactions would both get read locks on the 
>>>> uri, then one would get a write lock and the other would fail and retry. 
>>>> Maybe I'm missing something though.
>>>> 
>>>> But anyway, I agree unique indexes would be a handy feature. e.g. our docs 
>>>> have a DOI element which *should* be unique but occasionally aren't, would 
>>>> be nice to enforce that rather than have to code defensively.
>>>> 
>>>> Rob
>>>> ________________________________________
>>>> From: [email protected] 
>>>> [[email protected]] on behalf of Ron Hitchens 
>>>> [[email protected]]
>>>> Sent: 04 June 2014 19:31
>>>> To: MarkLogic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value 
>>>> Range    Indexes
>>>> 
>>>> Rob,
>>>> 
>>>>    I believe there is a race condition here.  A document may not exit 
>>>> as-of the timestamp when this request starts running, but some other 
>>>> request could create one while it's running.  This request would then 
>>>> over-write that document.
>>>> 
>>>>    I'm actually more concerned about element values inside documents than 
>>>> generating unique document URIs.  It's easy to generate document URIs with 
>>>> 64-bit random numbers that are very unlikely to collide.  But I want to 
>>>> guarantee that some meaningful value inside a document is unique across 
>>>> all documents.
>>>> 
>>>>    In my case, the naming space is actually quite small because I want the 
>>>> IDs to be meaningful but unique.  For example "images:cats:fluffy:XX.png", 
>>>> where XX can increment or be set randomly until the ID is unique.  One way 
>>>> to check for uniqueness is to make the document URI from this ID, then 
>>>> test for an existing document.
>>>> 
>>>>    But this doesn't solve the general problem.  I could conceivably have 
>>>> multiple elements in the document that I want to be unique.  To check for 
>>>> unique element values it's necessary to run a cts query against the 
>>>> element(s).  And I'm not sure if you can completely close the race window 
>>>> between checking for an existing instance and inserting a new one if the 
>>>> query comes back empty.
>>>> 
>>>>    Someone from ML pointed out privately that checking for uniqueness in 
>>>> the index would require cross-cluster communication.  I'm sure that's 
>>>> true, but I'm also pretty sure that any user-level code solution is going 
>>>> to be far less efficient.  I'd be happy to pay that ingestion time penalty 
>>>> for the guarantee that indexed element values are unique.  At query time, 
>>>> such a unique value index should perform like any other range index.
>>>> 
>>>> ---
>>>> Ron Hitchens {[email protected]}  +44 7879 358212
>>>> 
>>>> On Jun 4, 2014, at 6:59 PM, "Whitby, Rob" <[email protected]> wrote:
>>>> 
>>>>> How about something like this?
>>>>> 
>>>>> declare function unique-uri() {
>>>>>  let $uri := "/doc/" || xdmp:random() || ".xml"
>>>>>  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
>>>>> };
>>>>> 
>>>>> I guess because indexes are distributed across forests, ensuring 
>>>>> uniqueness is not that easy?
>>>>> 
>>>>> Rob
>>>>> ________________________________________
>>>>> From: [email protected] 
>>>>> [[email protected]] on behalf of Ron Hitchens 
>>>>> [[email protected]]
>>>>> Sent: 04 June 2014 18:01
>>>>> To: MarkLogic Developer Discussion
>>>>> Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range  
>>>>>       Indexes
>>>>> 
>>>>>   I'm working on a project, one aspect of which requires minting unique 
>>>>> IDs and assuring that no two documents with the same ID wind up in the 
>>>>> database.  I know how to accomplish this using locks (I'm pretty sure) 
>>>>> but any such implementation is awkward and prone to subtle edge case 
>>>>> errors, and can be difficult to test.
>>>>> 
>>>>>   It seems to me that this is something that MarkLogic could do much more 
>>>>> reliably and quickly than any user-level code.  The thought that occurred 
>>>>> to me is a variation on range indexes which only allow a single instance 
>>>>> of any given value.
>>>>> 
>>>>>   Conventional range indexes work by creating term lists that look like 
>>>>> this (see Jason Hunter's ML Architecture paper), where each term list 
>>>>> contains an element (or attribute) value and a list of fragment IDs where 
>>>>> that term exists.
>>>>> 
>>>>> aardvark | 23, 135, 469, 611
>>>>> ant      | 23, 469, 558, 611, 750
>>>>> baboon   | 53, 97, 469, 621
>>>>> etc...
>>>>> 
>>>>>   By making a range index like this but which only allows a single 
>>>>> fragment ID in the list, that would ensure that no two documents in the 
>>>>> database contain a given element with the same value.  That is, 
>>>>> attempting to add a second document with the same element or attribute 
>>>>> value would cause an exception.  And being a range index, it would 
>>>>> provide a fast lexicon of all the current unique values in the DB.
>>>>> 
>>>>>   Such an index would look something like this:
>>>>> 
>>>>> abc3vk34 | 17
>>>>> bkx46lkd | 52
>>>>> bz1d34nm | 37
>>>>> etc...
>>>>> 
>>>>>   Usage could be something like this:
>>>>> 
>>>>> declare function create-new-id-doc ($id-root as xs:string) as xs:string
>>>>> {
>>>>>    try {
>>>>>        let $id := $id-root || "-" || mylib:random-string(8)
>>>>>        let $uri := "/idregistry/id-" || $id
>>>>>        let $_ :=
>>>>>            xdmp:document-insert ($uri,
>>>>>                <registered-id>
>>>>>                    <id>{ $id }</id>
>>>>>                    <created>{ fn:current-dateTime() }</created>
>>>>>                </registered-id>
>>>>>         return $id
>>>>>    } catch (e) {
>>>>>        create-new-id-doc ($id-root)
>>>>>    }
>>>>> };
>>>>> 
>>>>>   This doesn't require that I write any (possibly buggy) mutual exclusion 
>>>>> code and I can be confident that once the xdmp:document-insert succeeds 
>>>>> that the ID is unique in the database and that the type (as configured 
>>>>> for the range index) is correct.
>>>>> 
>>>>>   Any love for Unique Value Range Indexes in the next version of 
>>>>> MarkLogic?
>>>>> 
>>>>> ---
>>>>> Ron Hitchens {[email protected]}  +44 7879 358212
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> -- 
>>> Wayne Feick
>>> Principal Engineer
>>> MarkLogic Corporation
>>> [email protected]
>>> Phone: +1 650 655 2378
>>> www.marklogic.com
>>> 
>>> This e-mail and any accompanying attachments are confidential. The 
>>> information is intended solely for the use of the individual to whom it is 
>>> addressed. Any review, disclosure, copying, distribution, or use of this 
>>> e-mail communication by others is strictly prohibited. If you are not the 
>>> intended recipient, please notify us immediately by returning this message 
>>> to the sender and delete all copies. Thank you for your cooperation.
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> 
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> -- 
> Wayne Feick
> Principal Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 650 655 2378
> www.marklogic.com
> 
> This e-mail and any accompanying attachments are confidential. The 
> information is intended solely for the use of the individual to whom it is 
> addressed. Any review, disclosure, copying, distribution, or use of this 
> e-mail communication by others is strictly prohibited. If you are not the 
> intended recipient, please notify us immediately by returning this message to 
> the sender and delete all copies. Thank you for your cooperation.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to