The code below illustrates how you can calculate co-occurrences between an 
element and the URI of the documents that contain instances of that element. 
Then, for each URI it counts the total occurrences. Note, that you'll need to 
have the URI lexicon enabled and an element range index on x. 

Justin

(: Insert some dummy data :)
let $docs := (
  <a><x>B</x><x>BB</x></a>,
  <a><x>B</x></a>,
  <a><c>C</c></a>,
  <a><x>B</x><x>BBB</x></a>
)
return 
  for $doc at $i in $docs 
  return xdmp:document-insert($i || '.xml', $doc)
;
(: Calculate counts of <x/> grouped by document URIs. Requires element range 
index on xs:QName('x') :)
let $co-occurr := cts:value-co-occurrences(cts:uri-reference(), 
cts:element-reference(xs:QName('x')), 'map')
for $uri in map:keys($co-occurr)
return $uri || ': ' || fn:count(map:get($co-occurr, $uri))



--
Justin Makeig
Director, Product Management
MarkLogic
[email protected]


> On Nov 17, 2016, at 11:19 PM, Raghu <[email protected]> wrote:
> 
> Hi All,
> 
> I've got around 40 million XML documents out of which few documents are 
> having an element say element x twice (they are supposed to have only one 
> element x), I need to find the list of documents are there with multiple 
> occurrences of that element x. what would be the ideal way to query them?
> 
> Thanks in adavance
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general




_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to