This might get you started:
let $size := 1000
let $distinct-element-qnames := distinct-values(
for $i in doc()[1 to $size]//*
return node-name($i) )
for $qn in $distinct-element-qnames
let $frequency := xdmp:estimate(
cts:search(doc(), cts:element-query($qn, cts:and-query(()) ) ) )
order by $frequency descending
return element element {
attribute local-name { local-name-from-QName($qn) },
attribute namespace { namespace-uri-from-QName($qn) },
attribute frequency { $frequency } }
The frequencies will cover the entire database, but you may need to
increase $size until you are confident that you have coverage of all
QNames. Starting from doc()[1 to $size] ensures a random sample of the
available documents in stable order.
-- Mike
On 2010-09-16 08:49, Alf Eaton wrote:
> I'm hoping to be able to inspect a fairly large collection of
> documents and list the distinct element names, attribute names and
> their usage frequency. Is it possible to do this using built-in
> MarkLogic functions (perhaps by inspecting the indexes)?
>
> alf
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general