The code below illustrates how you can calculate co-occurrences between an
element and the URI of the documents that contain instances of that element.
Then, for each URI it counts the total occurrences. Note, that you'll need to
have the URI lexicon enabled and an element range index on x.
Justin
(: Insert some dummy data :)
let $docs := (
<a><x>B</x><x>BB</x></a>,
<a><x>B</x></a>,
<a><c>C</c></a>,
<a><x>B</x><x>BBB</x></a>
)
return
for $doc at $i in $docs
return xdmp:document-insert($i || '.xml', $doc)
;
(: Calculate counts of <x/> grouped by document URIs. Requires element range
index on xs:QName('x') :)
let $co-occurr := cts:value-co-occurrences(cts:uri-reference(),
cts:element-reference(xs:QName('x')), 'map')
for $uri in map:keys($co-occurr)
return $uri || ': ' || fn:count(map:get($co-occurr, $uri))
--
Justin Makeig
Director, Product Management
MarkLogic
[email protected]
> On Nov 17, 2016, at 11:19 PM, Raghu <[email protected]> wrote:
>
> Hi All,
>
> I've got around 40 million XML documents out of which few documents are
> having an element say element x twice (they are supposed to have only one
> element x), I need to find the list of documents are there with multiple
> occurrences of that element x. what would be the ideal way to query them?
>
> Thanks in adavance
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general