A closer reading of the manual reveals my mistake: I needed to specify 
"item-frequency" in the element-values() query. Without it I was getting the 
count of *fragments* with the value, not the total number of occurrences. 

When I add the “item-frequency” option to element-values() then I get the 
correct count from the sum of cts:frequency().

Cheers,

E.
--
Eliot Kimber
http://contrext.com
 


On 8/14/17, 2:58 PM, "[email protected] on behalf of 
Eliot Kimber" <[email protected] on behalf of 
[email protected]> wrote:

    Using both cts:frequence and cts:count-aggregate I get numbers that are 
closer to the correct count but are short by about 200. What would account for 
the difference?
    
    Queries:
    
    let $profiles := 
collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrprof:evalResult/prof:*
    let $histograms := $profiles/prof:histogram
    let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
    let $durations := cts:element-values(xs:QName("prof:overall-elapsed"), (), 
"descending",
                         cts:collection-query($collection))
    let $count-frequency := sum(for $dur in $durations return 
cts:frequency($dur))
    let $overall-elapsed-ref := 
cts:element-reference(fn:QName("http://marklogic.com/xdmp/profile","overall-elapsed";),("type=dayTimeDuration"))
    
    let $count-frequency := sum(for $dur in $durations return 
cts:frequency($dur))
    let $count-aggregate := cts:count-aggregate($overall-elapsed-ref,(), 
cts:collection-query($collection))
    
    Results:
    
    <count-profiles>47539</count-profiles>
    <count-histograms>47539</count-histograms>
    <count-overall-elapsed>47539</count-overall-elapsed>
    <count-frequency>47371</count-frequency>
    <count-aggregate>47371</count-aggregate>
    <count-durations>21219</count-durations>
    
    Cheers,
    
    E.
    --
    Eliot Kimber
    http://contrext.com
     
    
    
    
    On 8/14/17, 1:53 PM, "[email protected] on behalf of 
Mary Holstege" <[email protected] on behalf of 
[email protected]> wrote:
    
        
        That is overkill.  The results you get out of cts:element-values have a 
 
        frequency (accessible via cts:frequency). The cts: aggregates (e.g.  
        cts:count, cts:sum) take the frequency into account.
        
        //Mary
        
        On Mon, 14 Aug 2017 11:42:07 -0700, Oleksii Segeda  
        <[email protected]> wrote:
        
        > Eliot,
        >
        > You can do something like this:
        >       
cts:element-value-co-occurrences(xs:QName("prof:overall-elapsed"),xs:QName("xdmp:document"))
        > if you have only one element per document.
        >
        > Best,
        >
        > Oleksii Segeda
        > IT Analyst
        > Information and Technology Solutions
        > www.worldbank.org
        >
        >
        > -----Original Message-----
        > From: [email protected]  
        > [mailto:[email protected]] On Behalf Of Eliot  
        > Kimber
        > Sent: Monday, August 14, 2017 2:31 PM
        > To: MarkLogic Developer Discussion <[email protected]>
        > Subject: [MarkLogic Dev General] Count of cts:element-values() not 
equal  
        > to number of element instances--what's going on?
        >
        > I have this query:
        >
        > let $durations := 
cts:element-values(xs:QName("prof:overall-elapsed"),  
        > (), "descending",
        >                      cts:collection-query($collection))
        >
        > And this query:
        >
        > let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
        >
        > Where there an element range index for prof:overall-elapsed.
        >
        > Comparing the two results I get very different numbers when I 
expected  
        > them to be equal:
        >
        > <count-overall-elapsed>47539</count-overall-elapsed>
        > <count-durations>21219</count-durations>
        >
        > Doing this:
        >
        > count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))
        >
        > Returns 21219, making it clear that the range index is returning  
        > distinct values, not all values. It makes sense in terms of how I 
would  
        > expect a range index to be structured (a one-to-many mapping for 
values  
        > to elements) but doesn’t make sense as the return for a function 
named  
        > “element-values” (and not element-distinct-values).
        >
        > I didn’t see this behavior mentioned in the docs (although the  
        > introduction to the Lexicon reference section does describe lexicons 
as  
        > sets of unique values).
        >
        > My requirement is to *quickly* get a list of the durations for all  
        > prof:expression elements (which I use for both counting and for  
        > bucketing, so I need all values, not just all distinct values).
        >
        > Is there a way to do what I want using only indexes?
        >
        > Thanks,
        >
        > E.
        > --
        > Eliot Kimber
        > http://contrext.com
        >
        >
        >
        > _______________________________________________
        > General mailing list
        > [email protected]
        > Manage your subscription at:
        > http://developer.marklogic.com/mailman/listinfo/general
        > _______________________________________________
        > General mailing list
        > [email protected]
        > Manage your subscription at:
        > http://developer.marklogic.com/mailman/listinfo/general
        
        
        -- 
        Using Opera's revolutionary email client: http://www.opera.com/mail/
        _______________________________________________
        General mailing list
        [email protected]
        Manage your subscription at: 
        http://developer.marklogic.com/mailman/listinfo/general
        
    
    
    _______________________________________________
    General mailing list
    [email protected]
    Manage your subscription at: 
    http://developer.marklogic.com/mailman/listinfo/general
    


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to