Thanks Wayne.
---
Ron Hitchens {[email protected]} +44 7879 358212
On Jun 4, 2014, at 11:12 PM, Wayne Feick <[email protected]> wrote:
> Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this. I'll
> add a note indicating your interest as well.
>
> Wayne.
>
>
> On 06/04/2014 03:00 PM, Ron Hitchens wrote:
>>
>> Wayne,
>>
>> Thanks for this. It's a useful code pattern for this sort of thing and I
>> will probably use it for the specific requirement I have at the moment (I
>> was planning to do something similar anyway).
>>
>> But this code, or any user-level code, does not fully implement the
>> uniqueness guarantee I'd like to have and that I think a specialized range
>> index could easily provide. This will work, but as you say it would be
>> necessary to always use this code convention. It would not prevent creation
>> of duplicate values by code that doesn't follow the convention. If
>> uniqueness were enforced by the index, then I could be confident that
>> uniqueness is absolutely guaranteed and I don't need to trust anyone
>> (including my future self) to always follow the same locking protocol.
>>
>> ---
>> Ron Hitchens {[email protected]} +44 7879 358212
>>
>> On Jun 4, 2014, at 9:19 PM, Wayne Feick <[email protected]> wrote:
>>
>>> The simplest is to have the document URI correspond to the element value,
>>> and if you can use a random value it's good for concurrency.
>>>
>>> If you can't do that, but you want to ensure only one document can have a
>>> particular value for an element, I think it's pretty easy using
>>> xdmp:lock-for-update() on an URI that corresponds to the element value. You
>>> don't actually need to create a document at that URI, just use it to
>>> serialize transactions. Here's one way to do it.
>>> declare function lock-element-value($qn as xs:QName, $v as item)
>>> {
>>> xdmp:lock-for-update(
>>> "http://acme.com/"
>>> || xdmp:hash64(fn:namespace-uri-from-QName($qn))
>>> || "/"
>>> || xdmp:hash64(fn:localname-from-QName($qn)))
>>> };
>>> You'd then do something like the following.
>>> let $lock := lock-element-value($qn, $v)
>>> let $existing := cts:search(fn:collection(), cts:element-range-query($qn,
>>> "=", $v, "unfiltered"))
>>> return
>>> if (fn:exists($existing))
>>> then ... do whatever you need to do with the existing document
>>> else ... create a new document, safe from a race with another transaction
>>> You'd want to use lock-element-value() in any updates that could affect a
>>> change in the element value (insert, update, delete). I think you could get
>>> away with ignoring deletes since those would automatically serialize with
>>> any transaction that would modify the existing document.
>>>
>>> We use this sort of pattern internally to ensure uniqueness of IDs.
>>>
>>> Wayne.
>>>
>>>
>>> On 06/04/2014 12:49 PM, Whitby, Rob wrote:
>>>> I thought 2 simultaneous transactions would both get read locks on the
>>>> uri, then one would get a write lock and the other would fail and retry.
>>>> Maybe I'm missing something though.
>>>>
>>>> But anyway, I agree unique indexes would be a handy feature. e.g. our docs
>>>> have a DOI element which *should* be unique but occasionally aren't, would
>>>> be nice to enforce that rather than have to code defensively.
>>>>
>>>> Rob
>>>> ________________________________________
>>>> From: [email protected]
>>>> [[email protected]] on behalf of Ron Hitchens
>>>> [[email protected]]
>>>> Sent: 04 June 2014 19:31
>>>> To: MarkLogic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value
>>>> Range Indexes
>>>>
>>>> Rob,
>>>>
>>>> I believe there is a race condition here. A document may not exit
>>>> as-of the timestamp when this request starts running, but some other
>>>> request could create one while it's running. This request would then
>>>> over-write that document.
>>>>
>>>> I'm actually more concerned about element values inside documents than
>>>> generating unique document URIs. It's easy to generate document URIs with
>>>> 64-bit random numbers that are very unlikely to collide. But I want to
>>>> guarantee that some meaningful value inside a document is unique across
>>>> all documents.
>>>>
>>>> In my case, the naming space is actually quite small because I want the
>>>> IDs to be meaningful but unique. For example "images:cats:fluffy:XX.png",
>>>> where XX can increment or be set randomly until the ID is unique. One way
>>>> to check for uniqueness is to make the document URI from this ID, then
>>>> test for an existing document.
>>>>
>>>> But this doesn't solve the general problem. I could conceivably have
>>>> multiple elements in the document that I want to be unique. To check for
>>>> unique element values it's necessary to run a cts query against the
>>>> element(s). And I'm not sure if you can completely close the race window
>>>> between checking for an existing instance and inserting a new one if the
>>>> query comes back empty.
>>>>
>>>> Someone from ML pointed out privately that checking for uniqueness in
>>>> the index would require cross-cluster communication. I'm sure that's
>>>> true, but I'm also pretty sure that any user-level code solution is going
>>>> to be far less efficient. I'd be happy to pay that ingestion time penalty
>>>> for the guarantee that indexed element values are unique. At query time,
>>>> such a unique value index should perform like any other range index.
>>>>
>>>> ---
>>>> Ron Hitchens {[email protected]} +44 7879 358212
>>>>
>>>> On Jun 4, 2014, at 6:59 PM, "Whitby, Rob" <[email protected]> wrote:
>>>>
>>>>> How about something like this?
>>>>>
>>>>> declare function unique-uri() {
>>>>> let $uri := "/doc/" || xdmp:random() || ".xml"
>>>>> return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
>>>>> };
>>>>>
>>>>> I guess because indexes are distributed across forests, ensuring
>>>>> uniqueness is not that easy?
>>>>>
>>>>> Rob
>>>>> ________________________________________
>>>>> From: [email protected]
>>>>> [[email protected]] on behalf of Ron Hitchens
>>>>> [[email protected]]
>>>>> Sent: 04 June 2014 18:01
>>>>> To: MarkLogic Developer Discussion
>>>>> Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
>>>>> Indexes
>>>>>
>>>>> I'm working on a project, one aspect of which requires minting unique
>>>>> IDs and assuring that no two documents with the same ID wind up in the
>>>>> database. I know how to accomplish this using locks (I'm pretty sure)
>>>>> but any such implementation is awkward and prone to subtle edge case
>>>>> errors, and can be difficult to test.
>>>>>
>>>>> It seems to me that this is something that MarkLogic could do much more
>>>>> reliably and quickly than any user-level code. The thought that occurred
>>>>> to me is a variation on range indexes which only allow a single instance
>>>>> of any given value.
>>>>>
>>>>> Conventional range indexes work by creating term lists that look like
>>>>> this (see Jason Hunter's ML Architecture paper), where each term list
>>>>> contains an element (or attribute) value and a list of fragment IDs where
>>>>> that term exists.
>>>>>
>>>>> aardvark | 23, 135, 469, 611
>>>>> ant | 23, 469, 558, 611, 750
>>>>> baboon | 53, 97, 469, 621
>>>>> etc...
>>>>>
>>>>> By making a range index like this but which only allows a single
>>>>> fragment ID in the list, that would ensure that no two documents in the
>>>>> database contain a given element with the same value. That is,
>>>>> attempting to add a second document with the same element or attribute
>>>>> value would cause an exception. And being a range index, it would
>>>>> provide a fast lexicon of all the current unique values in the DB.
>>>>>
>>>>> Such an index would look something like this:
>>>>>
>>>>> abc3vk34 | 17
>>>>> bkx46lkd | 52
>>>>> bz1d34nm | 37
>>>>> etc...
>>>>>
>>>>> Usage could be something like this:
>>>>>
>>>>> declare function create-new-id-doc ($id-root as xs:string) as xs:string
>>>>> {
>>>>> try {
>>>>> let $id := $id-root || "-" || mylib:random-string(8)
>>>>> let $uri := "/idregistry/id-" || $id
>>>>> let $_ :=
>>>>> xdmp:document-insert ($uri,
>>>>> <registered-id>
>>>>> <id>{ $id }</id>
>>>>> <created>{ fn:current-dateTime() }</created>
>>>>> </registered-id>
>>>>> return $id
>>>>> } catch (e) {
>>>>> create-new-id-doc ($id-root)
>>>>> }
>>>>> };
>>>>>
>>>>> This doesn't require that I write any (possibly buggy) mutual exclusion
>>>>> code and I can be confident that once the xdmp:document-insert succeeds
>>>>> that the ID is unique in the database and that the type (as configured
>>>>> for the range index) is correct.
>>>>>
>>>>> Any love for Unique Value Range Indexes in the next version of
>>>>> MarkLogic?
>>>>>
>>>>> ---
>>>>> Ron Hitchens {[email protected]} +44 7879 358212
>>>>>
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>> --
>>> Wayne Feick
>>> Principal Engineer
>>> MarkLogic Corporation
>>> [email protected]
>>> Phone: +1 650 655 2378
>>> www.marklogic.com
>>>
>>> This e-mail and any accompanying attachments are confidential. The
>>> information is intended solely for the use of the individual to whom it is
>>> addressed. Any review, disclosure, copying, distribution, or use of this
>>> e-mail communication by others is strictly prohibited. If you are not the
>>> intended recipient, please notify us immediately by returning this message
>>> to the sender and delete all copies. Thank you for your cooperation.
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> --
> Wayne Feick
> Principal Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 650 655 2378
> www.marklogic.com
>
> This e-mail and any accompanying attachments are confidential. The
> information is intended solely for the use of the individual to whom it is
> addressed. Any review, disclosure, copying, distribution, or use of this
> e-mail communication by others is strictly prohibited. If you are not the
> intended recipient, please notify us immediately by returning this message to
> the sender and delete all copies. Thank you for your cooperation.
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general