Definitely use the map option with element-values. Use an element-range-query 
instead of the element-value-query, too. You're already using the range index 
for the rest of the query, after all.

This returns a sequence of any values that are given, but invalid:

xquery version "1.0-ml";
let $i1-qn := xs:QName("index1")
let $given-seq-codes := ("222","333", "newval") (:1k to 10k with maybe 100 
newvals:) (:newval is test new code value:)
let $qry := cts:element-range-query($i1-qn, '=', $given-seq-codes)
let $valid-map := cts:element-values($i1-qn, (), 'map', $qry)
let $given-map := map:map()
let $_ := for $i in $given-seq-codes return map:put($given-map, $i, $i) (: 
pre-ML6 :)
return map:keys($given-map - $valid-map)

With ML6 the map:put code could be a little more elegant - and probably a 
little faster for large sequences.

let $_ := $given-seq-codes ! map:put($given-map, ., .) (: ML6+ :)

If you do this in multiple places, some of the code above could be refactored 
into new functions.

-- Mike

On 8 Feb 2013, at 09:01 , Damon Feldman <[email protected]> wrote:

> Paul,
>  
> This looks good to me, but map-based difference will be much, much faster 
> than using a predicate on a sequence as in:
> $given-seq-codes[not (. = $valid-codes-given-seq)]
>  
> and perhaps pre-retrieve the existing values as a map via:
>  
> cts:element-values($i1-qn, ("map"), (), $e-v-qry)
>  
> Damon
>  
> From: Paul M [mailto:[email protected]] 
> Sent: Friday, February 08, 2013 11:38 AM
> To: Damon Feldman; MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] finding an id that does not exist
>  
> xquery version "1.0-ml";
> declare namespace html = "http://www.w3.org/1999/xhtml";;
> xdmp:query-trace(true()),
> let $i1-qn := xs:QName("index1")
> let $given-seq-codes := ("222","333", "newval") (:1k to 10k with maybe 100 
> newvals:) (:newval is test new code value:)
> let $e-v-qry := cts:element-value-query($i1-qn, ($given-seq-codes)) 
> let $valid-codes-given-seq := cts:element-values($i1-qn, (), (), $e-v-qry) 
> (:1k to 10k:)
> let $all-valid-codes := cts:element-values($i1-qn, ()) (:millions:)
> let $new-codes-given-seq:= $given-seq-codes[not (. = $valid-codes-given-seq)]
> return ($new-codes-given-seq, "|",$valid-codes-given-seq, "|", 
> $all-valid-codes) 
>  
> This appears to give correct response, so far, small test?
>  
> From: Damon Feldman <[email protected]>
> To: Paul M <[email protected]>; MarkLogic Developer Discussion 
> <[email protected]> 
> Sent: Friday, February 8, 2013 10:59 AM
> Subject: RE: [MarkLogic Dev General] finding an id that does not exist
>  
> Paul,
>  
> That may not be intractable, depending on the response time you need. E.g. 
> this runs in 1M values in 10 seconds on my laptop:
>  
> let $m1 := map:map()
> let $add:= for $i in 1 to 1000000 return map:put($m1, 
> xs:string(xdmp:random(1000000)), true())
>  
> let $m2 := map:map()
> let $add:= for $i in 1 to 10000 return map:put($m2, 
> xs:string(xdmp:random(1000000)), true())
>  
> return (
>   count(map:keys($m1 - $m2)),
>   xdmp:elapsed-time()
>   )
>  
> Yours,
> Damon
>  
> From: Paul M [mailto:[email protected]] 
> Sent: Friday, February 08, 2013 10:54 AM
> To: Damon Feldman; MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] finding an id that does not exist
>  
> Hi Damon,
> 
> The number of uniqueIds is somewhat high, so element-values will be rather 
> larger (1mil+).  The control seq ids will be in 1k-10k range 
> "non-sequential". The missing id's from the control seq likely be in the 100 
> -1000.
> But I'll chk and see.
> 
> Thanks..
>  
>  
> From: Damon Feldman <[email protected]>
> To: Paul M <[email protected]>; MarkLogic Developer Discussion 
> <[email protected]> 
> Sent: Friday, February 8, 2013 9:49 AM
> Subject: RE: [MarkLogic Dev General] finding an id that does not exist
>  
> Paul,
>  
> I believe you can range-index the uniqueId, element or attribute, then call 
> cts:element-values() with the option to return data as a map. You can put 
> your other sequence into a map also and “subtract” maps via the “-“ operator 
> to get a fast set difference.
>  
> Yours,
> Damon
> --
> Damon Feldman
> Sr. Principal Consultant, MarkLogic
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Paul M
> Sent: Friday, February 08, 2013 9:19 AM
> To: [email protected]
> Subject: [MarkLogic Dev General] finding an id that does not exist
>  
> 4 documents: docA, docB, docC, docD. Each have a unique id field with values: 
>  111, 222, 333, 555 respectively. I have a sequence 111,222,333,444. 444 does 
> not exist in the document set docA, docB, docC, docD. Is there a faster way 
> of finding this information. I have looked at a few cts functions but I keep 
> coming back to recurse through each sequence 111,222,333,444 and do 
> xdmp:estimate cts:search cts:element-value-query on each value. Fast, but 
> still takes time. Maybe co-occurrence, if data has multiple id fields? 
> 111-aaa,222-bbb,333-ccc,555-eee
> 
> thanks
>  
>  
>  
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to