Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Geert Josten Wed, 11 Jun 2014 05:14:36 -0700

I'm sure you will find lots of mentions about this in markmail if you look
for unique id, and random. MarkLogic was using that same method internally
as well for creating its own objects. The idea is indeed that you run that
if in update mode, in the same transaction in which you plan to do the
insert. The 'lookahead' will create a read lock, which causes writes from
other transactions to wait and retry if necessary..


Cheers,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected]
[mailto:[email protected]] Namens Ron Hitchens
Verzonden: donderdag 5 juni 2014 00:19
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] New Feature Request: Unique Value
Range Indexes


   Unless your unique-uri() function is running in a non-update query, in
which case it runs lock free at a timestamp.  If you're using the pattern of
main code as a query and updates delegated to invoked/eval'ed transactions,
you could get bit by this.  It would work fine the vast majority of the
time, but you wouldn't be protected from someone else's update happening
between your check in the query and the execution of your invoked update.

   DOIs are a perfect example of what I'm talking about.  Or account
numbers, or patient record IDs, or aircraft tail numbers, etc.  The impact
of non-unique record identifiers can range from annoying all the way to
legally/financially costly or even life-threatening if you're managing
medication records, for example.

---
Ron Hitchens {[email protected]}  +44 7879 358212

On Jun 4, 2014, at 8:49 PM, "Whitby, Rob" <[email protected]> wrote:

> I thought 2 simultaneous transactions would both get read locks on the
uri, then one would get a write lock and the other would fail and retry.
Maybe I'm missing something though.
> 
> But anyway, I agree unique indexes would be a handy feature. e.g. our docs
have a DOI element which *should* be unique but occasionally aren't, would
be nice to enforce that rather than have to code defensively.
> 
> Rob
> ________________________________________
> From: [email protected]
[[email protected]] on behalf of Ron Hitchens
[[email protected]]
> Sent: 04 June 2014 19:31
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value
Range    Indexes
> 
> Rob,
> 
>   I believe there is a race condition here.  A document may not exit as-of
the timestamp when this request starts running, but some other request could
create one while it's running.  This request would then over-write that
document.
> 
>   I'm actually more concerned about element values inside documents than
generating unique document URIs.  It's easy to generate document URIs with
64-bit random numbers that are very unlikely to collide.  But I want to
guarantee that some meaningful value inside a document is unique across all
documents.
> 
>   In my case, the naming space is actually quite small because I want the
IDs to be meaningful but unique.  For example "images:cats:fluffy:XX.png",
where XX can increment or be set randomly until the ID is unique.  One way
to check for uniqueness is to make the document URI from this ID, then test
for an existing document.
> 
>   But this doesn't solve the general problem.  I could conceivably have
multiple elements in the document that I want to be unique.  To check for
unique element values it's necessary to run a cts query against the
element(s).  And I'm not sure if you can completely close the race window
between checking for an existing instance and inserting a new one if the
query comes back empty.
> 
>   Someone from ML pointed out privately that checking for uniqueness in
the index would require cross-cluster communication.  I'm sure that's true,
but I'm also pretty sure that any user-level code solution is going to be
far less efficient.  I'd be happy to pay that ingestion time penalty for the
guarantee that indexed element values are unique.  At query time, such a
unique value index should perform like any other range index.
> 
> ---
> Ron Hitchens {[email protected]}  +44 7879 358212
> 
> On Jun 4, 2014, at 6:59 PM, "Whitby, Rob" <[email protected]> wrote:
> 
>> How about something like this?
>> 
>> declare function unique-uri() {
>> let $uri := "/doc/" || xdmp:random() || ".xml"
>> return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
>> };
>> 
>> I guess because indexes are distributed across forests, ensuring
uniqueness is not that easy?
>> 
>> Rob
>> ________________________________________
>> From: [email protected]
[[email protected]] on behalf of Ron Hitchens
[[email protected]]
>> Sent: 04 June 2014 18:01
>> To: MarkLogic Developer Discussion
>> Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes
>> 
>>  I'm working on a project, one aspect of which requires minting unique
IDs and assuring that no two documents with the same ID wind up in the
database.  I know how to accomplish this using locks (I'm pretty sure) but
any such implementation is awkward and prone to subtle edge case errors, and
can be difficult to test.
>> 
>>  It seems to me that this is something that MarkLogic could do much more
reliably and quickly than any user-level code.  The thought that occurred to
me is a variation on range indexes which only allow a single instance of any
given value.
>> 
>>  Conventional range indexes work by creating term lists that look like
this (see Jason Hunter's ML Architecture paper), where each term list
contains an element (or attribute) value and a list of fragment IDs where
that term exists.
>> 
>> aardvark | 23, 135, 469, 611
>> ant      | 23, 469, 558, 611, 750
>> baboon   | 53, 97, 469, 621
>> etc...
>> 
>>  By making a range index like this but which only allows a single
fragment ID in the list, that would ensure that no two documents in the
database contain a given element with the same value.  That is, attempting
to add a second document with the same element or attribute value would
cause an exception.  And being a range index, it would provide a fast
lexicon of all the current unique values in the DB.
>> 
>>  Such an index would look something like this:
>> 
>> abc3vk34 | 17
>> bkx46lkd | 52
>> bz1d34nm | 37
>> etc...
>> 
>>  Usage could be something like this:
>> 
>> declare function create-new-id-doc ($id-root as xs:string) as xs:string
>> {
>>   try {
>>       let $id := $id-root || "-" || mylib:random-string(8)
>>       let $uri := "/idregistry/id-" || $id
>>       let $_ :=
>>           xdmp:document-insert ($uri,
>>               <registered-id>
>>                   <id>{ $id }</id>
>>                   <created>{ fn:current-dateTime() }</created>
>>               </registered-id>
>>        return $id
>>   } catch (e) {
>>       create-new-id-doc ($id-root)
>>   }
>> };
>> 
>>  This doesn't require that I write any (possibly buggy) mutual exclusion
code and I can be confident that once the xdmp:document-insert succeeds that
the ID is unique in the database and that the type (as configured for the
range index) is correct.
>> 
>>  Any love for Unique Value Range Indexes in the next version of
MarkLogic?
>> 
>> ---
>> Ron Hitchens {[email protected]}  +44 7879 358212
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Reply via email to