Keep in mind that the server indexes attributes as element-attribute 
terms. In the interest of completeness I will also point out that the 
indexes used by these queries map terms to documents. So xdmp:estimate 
counts the number of document that match, not the number of matching nodes.

Within those limits, this query seems to work. I have no wildcard 
indexes enabled, so the estimate appears to ignore the '*' part and 
simply looks for the element-attribute term.

xdmp:estimate(
   cts:search(
     doc(),
     cts:element-attribute-value-query(
       xs:QName('foo'), xs:QName('bar'), '*', 'wildcarded') ))

-- Mike

On 2010-09-16 11:04, Alf Eaton wrote:
> Thanks Michael, that's just what I needed. I removed the
> namespace-related code as the elements are all in the same namespace,
> and added a section to get the names of the attributes used on each
> element (see below). It would be nice to get the usage frequency of
> each attribute on the given element, too: is there an equivalent of
> "cts:element-query($qn, cts:and-query(()))" that would work for
> estimating attribute frequencies?
>
> ===
> let $size := 10
>
> let $distinct-element-qnames := fn:distinct-values(
>    for $i in collection("ingested")[1 to $size]//*
>      return fn:node-name($i)
> )
>
> for $qn in $distinct-element-qnames
>    let $name := fn:local-name-from-QName($qn)
>
>    let $attributes := fn:distinct-values(
>        for $i in collection("ingested")[1 to $size]//*[local-name(.) =
> $name]/attribute::*
>        return fn:node-name($i)
>    )
>
>    let $frequency := xdmp:estimate(
>      cts:search(
>        collection("ingested"),
>        cts:element-query($qn, cts:and-query(()))
>      )
>    )
>    order by $frequency descending
>
>    return element element {
>      attribute name { $name },
>      attribute frequency { $frequency },
>      for $attribute-name in $attributes
>        return element attribute { attribute name { $attribute-name } }
>    }
> ===
>
> On 16 September 2010 17:42, Michael Blakeley
> <[email protected]>  wrote:
>> This might get you started:
>>
>> let $size := 1000
>> let $distinct-element-qnames := distinct-values(
>>    for $i in doc()[1 to $size]//*
>>    return node-name($i) )
>> for $qn in $distinct-element-qnames
>> let $frequency := xdmp:estimate(
>>    cts:search(doc(), cts:element-query($qn, cts:and-query(()) ) ) )
>> order by $frequency descending
>> return element element {
>>    attribute local-name { local-name-from-QName($qn) },
>>    attribute namespace { namespace-uri-from-QName($qn) },
>>    attribute frequency { $frequency } }
>>
>> The frequencies will cover the entire database, but you may need to
>> increase $size until you are confident that you have coverage of all
>> QNames. Starting from doc()[1 to $size] ensures a random sample of the
>> available documents in stable order.
>>
>> -- Mike
>>
>> On 2010-09-16 08:49, Alf Eaton wrote:
>>> I'm hoping to be able to inspect a fairly large collection of
>>> documents and list the distinct element names, attribute names and
>>> their usage frequency. Is it possible to do this using built-in
>>> MarkLogic functions (perhaps by inspecting the indexes)?
>>>
>>> alf
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to