Wild guess.. Empty prof:overall-elapsed elements, that are
ignored/rejected by the range index?

Cheers

On 8/14/17, 9:58 PM, "general-boun...@developer.marklogic.com on behalf of
Eliot Kimber" <general-boun...@developer.marklogic.com on behalf of
ekim...@contrext.com> wrote:

>Using both cts:frequence and cts:count-aggregate I get numbers that are
>closer to the correct count but are short by about 200. What would
>account for the difference?
>
>Queries:
>
>let $profiles := 
>collection($collection)/enrprof:profiling-instance/enrprof:enrichment/enrp
>rof:evalResult/prof:*
>let $histograms := $profiles/prof:histogram
>let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>let $durations := cts:element-values(xs:QName("prof:overall-elapsed"),
>(), "descending",
>                     cts:collection-query($collection))
>let $count-frequency := sum(for $dur in $durations return
>cts:frequency($dur))
>let $overall-elapsed-ref :=
>cts:element-reference(fn:QName("http://marklogic.com/xdmp/profile","overal
>l-elapsed"),("type=dayTimeDuration"))
>
>let $count-frequency := sum(for $dur in $durations return
>cts:frequency($dur))
>let $count-aggregate := cts:count-aggregate($overall-elapsed-ref,(),
>cts:collection-query($collection))
>
>Results:
>
><count-profiles>47539</count-profiles>
><count-histograms>47539</count-histograms>
><count-overall-elapsed>47539</count-overall-elapsed>
><count-frequency>47371</count-frequency>
><count-aggregate>47371</count-aggregate>
><count-durations>21219</count-durations>
>
>Cheers,
>
>E.
>--
>Eliot Kimber
>http://contrext.com
> 
>
>
>
>On 8/14/17, 1:53 PM, "general-boun...@developer.marklogic.com on behalf
>of Mary Holstege" <general-boun...@developer.marklogic.com on behalf of
>mary.holst...@marklogic.com> wrote:
>
>    
>    That is overkill.  The results you get out of cts:element-values have
>a  
>    frequency (accessible via cts:frequency). The cts: aggregates (e.g.
>    cts:count, cts:sum) take the frequency into account.
>    
>    //Mary
>    
>    On Mon, 14 Aug 2017 11:42:07 -0700, Oleksii Segeda
>    <oseg...@worldbankgroup.org> wrote:
>    
>    > Eliot,
>    >
>    > You can do something like this:
>    > 
>       
> cts:element-value-co-occurrences(xs:QName("prof:overall-elapsed"),xs:QNam
>e("xdmp:document"))
>    > if you have only one element per document.
>    >
>    > Best,
>    >
>    > Oleksii Segeda
>    > IT Analyst
>    > Information and Technology Solutions
>    > www.worldbank.org
>    >
>    >
>    > -----Original Message-----
>    > From: general-boun...@developer.marklogic.com
>    > [mailto:general-boun...@developer.marklogic.com] On Behalf Of Eliot
> 
>    > Kimber
>    > Sent: Monday, August 14, 2017 2:31 PM
>    > To: MarkLogic Developer Discussion <general@developer.marklogic.com>
>    > Subject: [MarkLogic Dev General] Count of cts:element-values() not
>equal  
>    > to number of element instances--what's going on?
>    >
>    > I have this query:
>    >
>    > let $durations :=
>cts:element-values(xs:QName("prof:overall-elapsed"),
>    > (), "descending",
>    >                      cts:collection-query($collection))
>    >
>    > And this query:
>    >
>    > let $overall-elapsed := $profiles/prof:metadata/prof:overall-elapsed
>    >
>    > Where there an element range index for prof:overall-elapsed.
>    >
>    > Comparing the two results I get very different numbers when I
>expected  
>    > them to be equal:
>    >
>    > <count-overall-elapsed>47539</count-overall-elapsed>
>    > <count-durations>21219</count-durations>
>    >
>    > Doing this:
>    >
>    > count(distinct-values($overall-elapsed ! xs:dayTimeDuration(.))
>    >
>    > Returns 21219, making it clear that the range index is returning
>    > distinct values, not all values. It makes sense in terms of how I
>would  
>    > expect a range index to be structured (a one-to-many mapping for
>values  
>    > to elements) but doesn¹t make sense as the return for a function
>named  
>    > ³element-values² (and not element-distinct-values).
>    >
>    > I didn¹t see this behavior mentioned in the docs (although the
>    > introduction to the Lexicon reference section does describe
>lexicons as  
>    > sets of unique values).
>    >
>    > My requirement is to *quickly* get a list of the durations for all
>    > prof:expression elements (which I use for both counting and for
>    > bucketing, so I need all values, not just all distinct values).
>    >
>    > Is there a way to do what I want using only indexes?
>    >
>    > Thanks,
>    >
>    > E.
>    > --
>    > Eliot Kimber
>    > http://contrext.com
>    >
>    >
>    >
>    > _______________________________________________
>    > General mailing list
>    > General@developer.marklogic.com
>    > Manage your subscription at:
>    > http://developer.marklogic.com/mailman/listinfo/general
>    > _______________________________________________
>    > General mailing list
>    > General@developer.marklogic.com
>    > Manage your subscription at:
>    > http://developer.marklogic.com/mailman/listinfo/general
>    
>    
>    -- 
>    Using Opera's revolutionary email client: http://www.opera.com/mail/
>    _______________________________________________
>    General mailing list
>    General@developer.marklogic.com
>    Manage your subscription at:
>    http://developer.marklogic.com/mailman/listinfo/general
>    
>
>
>_______________________________________________
>General mailing list
>General@developer.marklogic.com
>Manage your subscription at:
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to