The general topic of generating unique id's is even a lot older. I like the
idea of the database being able to impose a uniqueness constraint on
anything stored in it. It is much more difficult to guarantee that code is
behaving correctly, then imposing such an assertion..

 

Interesting thought to use (range) indexes for that, hadn't heard that one
before!

 

Cheers,

Geert

 

Van: [email protected]
[mailto:[email protected]] Namens Wayne Feick
Verzonden: donderdag 5 juni 2014 00:12
Aan: [email protected]
Onderwerp: Re: [MarkLogic Dev General] New Feature Request: Unique Value
Range Indexes

 

Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this.
I'll add a note indicating your interest as well.

Wayne.



On 06/04/2014 03:00 PM, Ron Hitchens wrote:

 

Wayne, 

 

   Thanks for this.  It's a useful code pattern for this sort of thing and I
will probably use it for the specific requirement I have at the moment (I
was planning to do something similar anyway).

 

   But this code, or any user-level code, does not fully implement the
uniqueness guarantee I'd like to have and that I think a specialized range
index could easily provide.  This will work, but as you say it would be
necessary to always use this code convention.  It would not prevent creation
of duplicate values by code that doesn't follow the convention.  If
uniqueness were enforced by the index, then I could be confident that
uniqueness is absolutely guaranteed and I don't need to trust anyone
(including my future self) to always follow the same locking protocol.


---

Ron Hitchens {[email protected] <mailto:[email protected]> }  +44 7879
358212

 

On Jun 4, 2014, at 9:19 PM, Wayne Feick <[email protected]
<mailto:[email protected]> > wrote:





The simplest is to have the document URI correspond to the element value,
and if you can use a random value it's good for concurrency.

If you can't do that, but you want to ensure only one document can have a
particular value for an element, I think it's pretty easy using
xdmp:lock-for-update() on an URI that corresponds to the element value. You
don't actually need to create a document at that URI, just use it to
serialize transactions. Here's one way to do it.

declare function lock-element-value($qn as xs:QName, $v as item)
{
  xdmp:lock-for-update(
     <http://acme.com/> "http://acme.com/";
    || xdmp:hash64(fn:namespace-uri-from-QName($qn))
    || "/"
    || xdmp:hash64(fn:localname-from-QName($qn)))
};

You'd then do something like the following.

let $lock := lock-element-value($qn, $v)
let $existing := cts:search(fn:collection(), cts:element-range-query($qn,
"=", $v, "unfiltered"))
return
  if (fn:exists($existing))
  then ... do whatever you need to do with the existing document
  else ... create a new document, safe from a race with another transaction

You'd want to use lock-element-value() in any updates that could affect a
change in the element value (insert, update, delete). I think you could get
away with ignoring deletes since those would automatically serialize with
any transaction that would modify the existing document.

We use this sort of pattern internally to ensure uniqueness of IDs.

Wayne.



On 06/04/2014 12:49 PM, Whitby, Rob wrote:

I thought 2 simultaneous transactions would both get read locks on the uri,
then one would get a write lock and the other would fail and retry. Maybe
I'm missing something though.
 
But anyway, I agree unique indexes would be a handy feature. e.g. our docs
have a DOI element which *should* be unique but occasionally aren't, would
be nice to enforce that rather than have to code defensively.
 
Rob
________________________________________
From: [email protected]
<mailto:[email protected]>
[[email protected]
<mailto:[email protected]> ] on behalf of Ron Hitchens
[[email protected] <mailto:[email protected]> ]
Sent: 04 June 2014 19:31
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes
 
Rob,
 
   I believe there is a race condition here.  A document may not exit as-of
the timestamp when this request starts running, but some other request could
create one while it's running.  This request would then over-write that
document.
 
   I'm actually more concerned about element values inside documents than
generating unique document URIs.  It's easy to generate document URIs with
64-bit random numbers that are very unlikely to collide.  But I want to
guarantee that some meaningful value inside a document is unique across all
documents.
 
   In my case, the naming space is actually quite small because I want the
IDs to be meaningful but unique.  For example "images:cats:fluffy:XX.png",
where XX can increment or be set randomly until the ID is unique.  One way
to check for uniqueness is to make the document URI from this ID, then test
for an existing document.
 
   But this doesn't solve the general problem.  I could conceivably have
multiple elements in the document that I want to be unique.  To check for
unique element values it's necessary to run a cts query against the
element(s).  And I'm not sure if you can completely close the race window
between checking for an existing instance and inserting a new one if the
query comes back empty.
 
   Someone from ML pointed out privately that checking for uniqueness in the
index would require cross-cluster communication.  I'm sure that's true, but
I'm also pretty sure that any user-level code solution is going to be far
less efficient.  I'd be happy to pay that ingestion time penalty for the
guarantee that indexed element values are unique.  At query time, such a
unique value index should perform like any other range index.
 
---
Ron Hitchens {[email protected] <mailto:[email protected]> }  +44 7879
358212
 
On Jun 4, 2014, at 6:59 PM, "Whitby, Rob"  <mailto:[email protected]>
<[email protected]> wrote:
 

How about something like this?
 
declare function unique-uri() {
 let $uri := "/doc/" || xdmp:random() || ".xml"
 return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
};
 
I guess because indexes are distributed across forests, ensuring uniqueness
is not that easy?
 
Rob
________________________________________
From: [email protected]
<mailto:[email protected]>
[[email protected]
<mailto:[email protected]> ] on behalf of Ron Hitchens
[[email protected] <mailto:[email protected]> ]
Sent: 04 June 2014 18:01
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes
 
  I'm working on a project, one aspect of which requires minting unique IDs
and assuring that no two documents with the same ID wind up in the database.
I know how to accomplish this using locks (I'm pretty sure) but any such
implementation is awkward and prone to subtle edge case errors, and can be
difficult to test.
 
  It seems to me that this is something that MarkLogic could do much more
reliably and quickly than any user-level code.  The thought that occurred to
me is a variation on range indexes which only allow a single instance of any
given value.
 
  Conventional range indexes work by creating term lists that look like this
(see Jason Hunter's ML Architecture paper), where each term list contains an
element (or attribute) value and a list of fragment IDs where that term
exists.
 
aardvark | 23, 135, 469, 611
ant      | 23, 469, 558, 611, 750
baboon   | 53, 97, 469, 621
etc...
 
  By making a range index like this but which only allows a single fragment
ID in the list, that would ensure that no two documents in the database
contain a given element with the same value.  That is, attempting to add a
second document with the same element or attribute value would cause an
exception.  And being a range index, it would provide a fast lexicon of all
the current unique values in the DB.
 
  Such an index would look something like this:
 
abc3vk34 | 17
bkx46lkd | 52
bz1d34nm | 37
etc...
 
  Usage could be something like this:
 
declare function create-new-id-doc ($id-root as xs:string) as xs:string
{
   try {
       let $id := $id-root || "-" || mylib:random-string(8)
       let $uri := "/idregistry/id-" || $id
       let $_ :=
           xdmp:document-insert ($uri,
               <registered-id>
                   <id>{ $id }</id>
                   <created>{ fn:current-dateTime() }</created>
               </registered-id>
        return $id
   } catch (e) {
       create-new-id-doc ($id-root)
   }
};
 
  This doesn't require that I write any (possibly buggy) mutual exclusion
code and I can be confident that once the xdmp:document-insert succeeds that
the ID is unique in the database and that the type (as configured for the
range index) is correct.
 
  Any love for Unique Value Range Indexes in the next version of MarkLogic?
 
---
Ron Hitchens {[email protected] <mailto:[email protected]> }  +44 7879
358212
 
_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general





-- 
Wayne Feick
Principal Engineer
MarkLogic Corporation
[email protected] <mailto:[email protected]> 
Phone: +1 650 655 2378
www.marklogic.com <http://www.marklogic.com/> 
 
This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it is
addressed. Any review, disclosure, copying, distribution, or use of this
e-mail communication by others is strictly prohibited. If you are not the
intended recipient, please notify us immediately by returning this message
to the sender and delete all copies. Thank you for your cooperation.

_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general

 






_______________________________________________
General mailing list
[email protected] <mailto:[email protected]> 
http://developer.marklogic.com/mailman/listinfo/general





-- 
Wayne Feick
Principal Engineer
MarkLogic Corporation
[email protected] <mailto:[email protected]> 
Phone: +1 650 655 2378
www.marklogic.com <http://www.marklogic.com> 
 
This e-mail and any accompanying attachments are confidential. The
information is intended solely for the use of the individual to whom it is
addressed. Any review, disclosure, copying, distribution, or use of this
e-mail communication by others is strictly prohibited. If you are not the
intended recipient, please notify us immediately by returning this message
to the sender and delete all copies. Thank you for your cooperation.
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to