Keep in mind that the server indexes attributes as element-attribute
terms. In the interest of completeness I will also point out that the
indexes used by these queries map terms to documents. So xdmp:estimate
counts the number of document that match, not the number of matching nodes.
Within those limits, this query seems to work. I have no wildcard
indexes enabled, so the estimate appears to ignore the '*' part and
simply looks for the element-attribute term.
xdmp:estimate(
cts:search(
doc(),
cts:element-attribute-value-query(
xs:QName('foo'), xs:QName('bar'), '*', 'wildcarded') ))
-- Mike
On 2010-09-16 11:04, Alf Eaton wrote:
> Thanks Michael, that's just what I needed. I removed the
> namespace-related code as the elements are all in the same namespace,
> and added a section to get the names of the attributes used on each
> element (see below). It would be nice to get the usage frequency of
> each attribute on the given element, too: is there an equivalent of
> "cts:element-query($qn, cts:and-query(()))" that would work for
> estimating attribute frequencies?
>
> ===
> let $size := 10
>
> let $distinct-element-qnames := fn:distinct-values(
> for $i in collection("ingested")[1 to $size]//*
> return fn:node-name($i)
> )
>
> for $qn in $distinct-element-qnames
> let $name := fn:local-name-from-QName($qn)
>
> let $attributes := fn:distinct-values(
> for $i in collection("ingested")[1 to $size]//*[local-name(.) =
> $name]/attribute::*
> return fn:node-name($i)
> )
>
> let $frequency := xdmp:estimate(
> cts:search(
> collection("ingested"),
> cts:element-query($qn, cts:and-query(()))
> )
> )
> order by $frequency descending
>
> return element element {
> attribute name { $name },
> attribute frequency { $frequency },
> for $attribute-name in $attributes
> return element attribute { attribute name { $attribute-name } }
> }
> ===
>
> On 16 September 2010 17:42, Michael Blakeley
> <[email protected]> wrote:
>> This might get you started:
>>
>> let $size := 1000
>> let $distinct-element-qnames := distinct-values(
>> for $i in doc()[1 to $size]//*
>> return node-name($i) )
>> for $qn in $distinct-element-qnames
>> let $frequency := xdmp:estimate(
>> cts:search(doc(), cts:element-query($qn, cts:and-query(()) ) ) )
>> order by $frequency descending
>> return element element {
>> attribute local-name { local-name-from-QName($qn) },
>> attribute namespace { namespace-uri-from-QName($qn) },
>> attribute frequency { $frequency } }
>>
>> The frequencies will cover the entire database, but you may need to
>> increase $size until you are confident that you have coverage of all
>> QNames. Starting from doc()[1 to $size] ensures a random sample of the
>> available documents in stable order.
>>
>> -- Mike
>>
>> On 2010-09-16 08:49, Alf Eaton wrote:
>>> I'm hoping to be able to inspect a fairly large collection of
>>> documents and list the distinct element names, attribute names and
>>> their usage frequency. Is it possible to do this using built-in
>>> MarkLogic functions (perhaps by inspecting the indexes)?
>>>
>>> alf
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general