Definitely use the map option with element-values. Use an element-range-query
instead of the element-value-query, too. You're already using the range index
for the rest of the query, after all.
This returns a sequence of any values that are given, but invalid:
xquery version "1.0-ml";
let $i1-qn := xs:QName("index1")
let $given-seq-codes := ("222","333", "newval") (:1k to 10k with maybe 100
newvals:) (:newval is test new code value:)
let $qry := cts:element-range-query($i1-qn, '=', $given-seq-codes)
let $valid-map := cts:element-values($i1-qn, (), 'map', $qry)
let $given-map := map:map()
let $_ := for $i in $given-seq-codes return map:put($given-map, $i, $i) (:
pre-ML6 :)
return map:keys($given-map - $valid-map)
With ML6 the map:put code could be a little more elegant - and probably a
little faster for large sequences.
let $_ := $given-seq-codes ! map:put($given-map, ., .) (: ML6+ :)
If you do this in multiple places, some of the code above could be refactored
into new functions.
-- Mike
On 8 Feb 2013, at 09:01 , Damon Feldman <[email protected]> wrote:
> Paul,
>
> This looks good to me, but map-based difference will be much, much faster
> than using a predicate on a sequence as in:
> $given-seq-codes[not (. = $valid-codes-given-seq)]
>
> and perhaps pre-retrieve the existing values as a map via:
>
> cts:element-values($i1-qn, ("map"), (), $e-v-qry)
>
> Damon
>
> From: Paul M [mailto:[email protected]]
> Sent: Friday, February 08, 2013 11:38 AM
> To: Damon Feldman; MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] finding an id that does not exist
>
> xquery version "1.0-ml";
> declare namespace html = "http://www.w3.org/1999/xhtml";
> xdmp:query-trace(true()),
> let $i1-qn := xs:QName("index1")
> let $given-seq-codes := ("222","333", "newval") (:1k to 10k with maybe 100
> newvals:) (:newval is test new code value:)
> let $e-v-qry := cts:element-value-query($i1-qn, ($given-seq-codes))
> let $valid-codes-given-seq := cts:element-values($i1-qn, (), (), $e-v-qry)
> (:1k to 10k:)
> let $all-valid-codes := cts:element-values($i1-qn, ()) (:millions:)
> let $new-codes-given-seq:= $given-seq-codes[not (. = $valid-codes-given-seq)]
> return ($new-codes-given-seq, "|",$valid-codes-given-seq, "|",
> $all-valid-codes)
>
> This appears to give correct response, so far, small test?
>
> From: Damon Feldman <[email protected]>
> To: Paul M <[email protected]>; MarkLogic Developer Discussion
> <[email protected]>
> Sent: Friday, February 8, 2013 10:59 AM
> Subject: RE: [MarkLogic Dev General] finding an id that does not exist
>
> Paul,
>
> That may not be intractable, depending on the response time you need. E.g.
> this runs in 1M values in 10 seconds on my laptop:
>
> let $m1 := map:map()
> let $add:= for $i in 1 to 1000000 return map:put($m1,
> xs:string(xdmp:random(1000000)), true())
>
> let $m2 := map:map()
> let $add:= for $i in 1 to 10000 return map:put($m2,
> xs:string(xdmp:random(1000000)), true())
>
> return (
> count(map:keys($m1 - $m2)),
> xdmp:elapsed-time()
> )
>
> Yours,
> Damon
>
> From: Paul M [mailto:[email protected]]
> Sent: Friday, February 08, 2013 10:54 AM
> To: Damon Feldman; MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] finding an id that does not exist
>
> Hi Damon,
>
> The number of uniqueIds is somewhat high, so element-values will be rather
> larger (1mil+). The control seq ids will be in 1k-10k range
> "non-sequential". The missing id's from the control seq likely be in the 100
> -1000.
> But I'll chk and see.
>
> Thanks..
>
>
> From: Damon Feldman <[email protected]>
> To: Paul M <[email protected]>; MarkLogic Developer Discussion
> <[email protected]>
> Sent: Friday, February 8, 2013 9:49 AM
> Subject: RE: [MarkLogic Dev General] finding an id that does not exist
>
> Paul,
>
> I believe you can range-index the uniqueId, element or attribute, then call
> cts:element-values() with the option to return data as a map. You can put
> your other sequence into a map also and “subtract” maps via the “-“ operator
> to get a fast set difference.
>
> Yours,
> Damon
> --
> Damon Feldman
> Sr. Principal Consultant, MarkLogic
>
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Paul M
> Sent: Friday, February 08, 2013 9:19 AM
> To: [email protected]
> Subject: [MarkLogic Dev General] finding an id that does not exist
>
> 4 documents: docA, docB, docC, docD. Each have a unique id field with values:
> 111, 222, 333, 555 respectively. I have a sequence 111,222,333,444. 444 does
> not exist in the document set docA, docB, docC, docD. Is there a faster way
> of finding this information. I have looked at a few cts functions but I keep
> coming back to recurse through each sequence 111,222,333,444 and do
> xdmp:estimate cts:search cts:element-value-query on each value. Fast, but
> still takes time. Maybe co-occurrence, if data has multiple id fields?
> 111-aaa,222-bbb,333-ccc,555-eee
>
> thanks
>
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general