Paul,

That may not be intractable, depending on the response time you need. E.g. this 
runs in 1M values in 10 seconds on my laptop:

let $m1 := map:map()
let $add:= for $i in 1 to 1000000 return map:put($m1, 
xs:string(xdmp:random(1000000)), true())

let $m2 := map:map()
let $add:= for $i in 1 to 10000 return map:put($m2, 
xs:string(xdmp:random(1000000)), true())

return (
  count(map:keys($m1 - $m2)),
  xdmp:elapsed-time()
  )

Yours,
Damon

From: Paul M [mailto:[email protected]]
Sent: Friday, February 08, 2013 10:54 AM
To: Damon Feldman; MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] finding an id that does not exist

Hi Damon,

The number of uniqueIds is somewhat high, so element-values will be rather 
larger (1mil+).  The control seq ids will be in 1k-10k range "non-sequential". 
The missing id's from the control seq likely be in the 100 -1000.
But I'll chk and see.

Thanks..


________________________________
From: Damon Feldman 
<[email protected]<mailto:[email protected]>>
To: Paul M <[email protected]<mailto:[email protected]>>; MarkLogic Developer 
Discussion 
<[email protected]<mailto:[email protected]>>
Sent: Friday, February 8, 2013 9:49 AM
Subject: RE: [MarkLogic Dev General] finding an id that does not exist

Paul,

I believe you can range-index the uniqueId, element or attribute, then call 
cts:element-values() with the option to return data as a map. You can put your 
other sequence into a map also and “subtract” maps via the “-“ operator to get 
a fast set difference.

Yours,
Damon
--
Damon Feldman
Sr. Principal Consultant, MarkLogic

From: 
[email protected]<mailto:[email protected]>
 [mailto:[email protected]] On Behalf Of Paul M
Sent: Friday, February 08, 2013 9:19 AM
To: [email protected]<mailto:[email protected]>
Subject: [MarkLogic Dev General] finding an id that does not exist

4 documents: docA, docB, docC, docD. Each have a unique id field with values:  
111, 222, 333, 555 respectively. I have a sequence 111,222,333,444. 444 does 
not exist in the document set docA, docB, docC, docD. Is there a faster way of 
finding this information. I have looked at a few cts functions but I keep 
coming back to recurse through each sequence 111,222,333,444 and do 
xdmp:estimate cts:search cts:element-value-query on each value. Fast, but still 
takes time. Maybe co-occurrence, if data has multiple id fields? 
111-aaa,222-bbb,333-ccc,555-eee

thanks


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to