Paul,

This looks good to me, but map-based difference will be much, much faster than 
using a predicate on a sequence as in:
$given-seq-codes[not (. = $valid-codes-given-seq)]

and perhaps pre-retrieve the existing values as a map via:

cts:element-values($i1-qn, ("map"), (), $e-v-qry)

Damon

From: Paul M [mailto:[email protected]]
Sent: Friday, February 08, 2013 11:38 AM
To: Damon Feldman; MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] finding an id that does not exist

xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;
xdmp:query-trace(true()),
let $i1-qn := xs:QName("index1")
let $given-seq-codes := ("222","333", "newval") (:1k to 10k with maybe 100 
newvals:) (:newval is test new code value:)
let $e-v-qry := cts:element-value-query($i1-qn, ($given-seq-codes))
let $valid-codes-given-seq := cts:element-values($i1-qn, (), (), $e-v-qry) (:1k 
to 10k:)
let $all-valid-codes := cts:element-values($i1-qn, ()) (:millions:)
let $new-codes-given-seq:= $given-seq-codes[not (. = $valid-codes-given-seq)]
return ($new-codes-given-seq, "|",$valid-codes-given-seq, "|", $all-valid-codes)

This appears to give correct response, so far, small test?

________________________________
From: Damon Feldman 
<[email protected]<mailto:[email protected]>>
To: Paul M <[email protected]<mailto:[email protected]>>; MarkLogic Developer 
Discussion 
<[email protected]<mailto:[email protected]>>
Sent: Friday, February 8, 2013 10:59 AM
Subject: RE: [MarkLogic Dev General] finding an id that does not exist

Paul,

That may not be intractable, depending on the response time you need. E.g. this 
runs in 1M values in 10 seconds on my laptop:

let $m1 := map:map()
let $add:= for $i in 1 to 1000000 return map:put($m1, 
xs:string(xdmp:random(1000000)), true())

let $m2 := map:map()
let $add:= for $i in 1 to 10000 return map:put($m2, 
xs:string(xdmp:random(1000000)), true())

return (
  count(map:keys($m1 - $m2)),
  xdmp:elapsed-time()
  )

Yours,
Damon

From: Paul M [mailto:[email protected]]
Sent: Friday, February 08, 2013 10:54 AM
To: Damon Feldman; MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] finding an id that does not exist

Hi Damon,

The number of uniqueIds is somewhat high, so element-values will be rather 
larger (1mil+).  The control seq ids will be in 1k-10k range "non-sequential". 
The missing id's from the control seq likely be in the 100 -1000.
But I'll chk and see.

Thanks..


________________________________
From: Damon Feldman 
<[email protected]<mailto:[email protected]>>
To: Paul M <[email protected]<mailto:[email protected]>>; MarkLogic Developer 
Discussion 
<[email protected]<mailto:[email protected]>>
Sent: Friday, February 8, 2013 9:49 AM
Subject: RE: [MarkLogic Dev General] finding an id that does not exist

Paul,

I believe you can range-index the uniqueId, element or attribute, then call 
cts:element-values() with the option to return data as a map. You can put your 
other sequence into a map also and “subtract” maps via the “-“ operator to get 
a fast set difference.

Yours,
Damon
--
Damon Feldman
Sr. Principal Consultant, MarkLogic

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Paul M
Sent: Friday, February 08, 2013 9:19 AM
To: [email protected]<mailto:[email protected]>
Subject: [MarkLogic Dev General] finding an id that does not exist

4 documents: docA, docB, docC, docD. Each have a unique id field with values:  
111, 222, 333, 555 respectively. I have a sequence 111,222,333,444. 444 does 
not exist in the document set docA, docB, docC, docD. Is there a faster way of 
finding this information. I have looked at a few cts functions but I keep 
coming back to recurse through each sequence 111,222,333,444 and do 
xdmp:estimate cts:search cts:element-value-query on each value. Fast, but still 
takes time. Maybe co-occurrence, if data has multiple id fields? 
111-aaa,222-bbb,333-ccc,555-eee

thanks



_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to