Thanks Michael, that's just what I needed. I removed the
namespace-related code as the elements are all in the same namespace,
and added a section to get the names of the attributes used on each
element (see below). It would be nice to get the usage frequency of
each attribute on the given element, too: is there an equivalent of
"cts:element-query($qn, cts:and-query(()))" that would work for
estimating attribute frequencies?
===
let $size := 10
let $distinct-element-qnames := fn:distinct-values(
for $i in collection("ingested")[1 to $size]//*
return fn:node-name($i)
)
for $qn in $distinct-element-qnames
let $name := fn:local-name-from-QName($qn)
let $attributes := fn:distinct-values(
for $i in collection("ingested")[1 to $size]//*[local-name(.) =
$name]/attribute::*
return fn:node-name($i)
)
let $frequency := xdmp:estimate(
cts:search(
collection("ingested"),
cts:element-query($qn, cts:and-query(()))
)
)
order by $frequency descending
return element element {
attribute name { $name },
attribute frequency { $frequency },
for $attribute-name in $attributes
return element attribute { attribute name { $attribute-name } }
}
===
On 16 September 2010 17:42, Michael Blakeley
<[email protected]> wrote:
> This might get you started:
>
> let $size := 1000
> let $distinct-element-qnames := distinct-values(
> for $i in doc()[1 to $size]//*
> return node-name($i) )
> for $qn in $distinct-element-qnames
> let $frequency := xdmp:estimate(
> cts:search(doc(), cts:element-query($qn, cts:and-query(()) ) ) )
> order by $frequency descending
> return element element {
> attribute local-name { local-name-from-QName($qn) },
> attribute namespace { namespace-uri-from-QName($qn) },
> attribute frequency { $frequency } }
>
> The frequencies will cover the entire database, but you may need to
> increase $size until you are confident that you have coverage of all
> QNames. Starting from doc()[1 to $size] ensures a random sample of the
> available documents in stable order.
>
> -- Mike
>
> On 2010-09-16 08:49, Alf Eaton wrote:
>> I'm hoping to be able to inspect a fairly large collection of
>> documents and list the distinct element names, attribute names and
>> their usage frequency. Is it possible to do this using built-in
>> MarkLogic functions (perhaps by inspecting the indexes)?
>>
>> alf
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general