If it's many thousands of documents, you're going to want to use range indexes 
to pull results out of memory and not load documents off disk.

It looks like you want to group into monthly ranges while the specific values 
are specified down to the day.  You can group using "bucketing".  See the 
cts:element-value-ranges() call and specify each month range as a bucket.  Use 
cts:frequency() to get the counts.

I might actually suggest you view the values as strings rather than xs:date 
values to avoid the timezone pain in the neck stuff.

cts:element-value-ranges(
  xs:QName("signed"),
  ("2010-01", "2010-02", "2010-03", ...)
)

If your cardinality is low (meaning the number of months is 100 or less) and 
you're willing for this to take a wee bit longer to execute, you can just do it 
with lots of xdmp:estimate() calls and not bother setting up the range indexes. 
 With that approach you'd do something like this:

xdmp:estimate(cts:search(doc(), cts:element-word-query(xs:QName("signed"), 
"2009-09"))

It'll give you the count for that month, based on substring comparison.  Make 
sure you have fast phrase queries on.  Do a call like this for every month you 
care about.  It'll probably still be very fast.  My guess is you're only doing 
this report once so this is a good way to get the answer without the addition 
of an index, if you don't already have it.

-jh-


On Aug 17, 2011, at 1:46 AM, Jakob Fix wrote:

> Hello there,
> 
> I have many thousands of documents which all have these three elements
> (among others):
> 
> <signed>2009-09-10</signed>
> <ratified />
> <enforced>2010-02-15</enforced>
> 
> I need, for each month of a given time period, the number of documents
> that have been signed, ratified or enforced, like this:
> 
> <signed>
>  ..
>  <data month="2009-09" number="12"/>
>  <data month="2009-10" number="5"/>
>  <data month="2009-11" number="0"/>
>  <data month="2009-12" number="45"/>
>  ..
> </signed>
> 
> Same thing for the "ratified" and "enforced" values, of course. What
> would be the best and fastest way to aggregate this information? Using
> the search:search API? Doing classic Xquery or using MarkLogic
> extensions? The data will be used as input for some graphs (think
> Markmail).
> 
> Thanks for any pointers,
> Jakob.
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to