Alexei,
I think it would be interesting to compare what you have now to:
let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D")
let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and (
SubmitDate > ( $fortnight ) )])
and
let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D")
let $count3D := xdmp:estimate(fn:collection/example,
cts:and-query((
cts:element-value-query(xs:QName("State"), "TX"),
cts:element-range-query(xs:QName("SubmitDate"), ">", $fortnight)
))
Expressions in XPath predicates are evaluated for each potential match, so I
suspect your code is doing the date math much more often than it needs to.
(illustration<http://blog.davidcassel.net/2010/07/gotcha-sequence-index-evaluation/>)
--
Dave Cassel
Developer Community Manager
MarkLogic Corporation<http://www.marklogic.com/>
Cell: +1-484-798-8720
From: Alexei Betin <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion
<[email protected]<mailto:[email protected]>>
Date: Friday, December 12, 2014 at 2:15 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] efficiently limit documents count by
dateTime field
Thanks, Mike!
You were right about simplifying the predicate, in the end it turns out the
major flaw was converting SubmitDate with xs:dateTime( SubmitDate ) - which for
some reason I had thought was required, but it turns out I can just use plain
SubmitDate in the predicate and that, along with creating a path index on
SubmitDate, gave me just what I wanted:
let $count3D := fn:count( fn:collection()/example[State="TX" and ( SubmitDate >
( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )])
let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and (
SubmitDate > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )])
Now the index kicked in, and both fn:count() became a lot faster and
xdmp:estimate() started giving the correct answer (same as count()) and is a
lot faster still!
Thanks a lot for your response!
~Alexei Betin
On Thu Dec 11 15:39:34 PST 2014, Michael Blakeley <mike at blakeley.com> wrote:
You are on the right track: you'll want a dateTime element range index on
SubmitDate. However I suspect the way the query is written is causing problems.
You could check this using xdmp:plan.
The problem may be that you're doing too much work in an XPath predicate. It's
easy to pretend that an XPath predicate acts like an array index, but its
performance characteristics are different. Most importantly, the predicate
expression can be evaluated many times: once for every item in the input. You
can read more about this at http://blakeley.com/blogofile/archives/518/
To get around this, bind as much of the predicate as possible to a variable.
let $fortnight := current-dateTime() - xs:dayTimeDuration("P14D")
return xdmp:estimate(
collection()/example[State="TX"][SubmitDate gt $fortnight])
Use xdmp:plan to verify that the right indexes are used.
Alternatively you could use a cts:search and cts:element-range-query instead of
an XPath predicate.
-- Mike
On 11 Dec 2014, at 14:52 , Alexei Betin <ABetin at elevate.com> wrote:
Hi,
I am very new to both MarkLogic and xQuery and this is my first post here. My
question is as follows:
I am trying to count documents that meet certain criteria and also fall into
particular date range (such as within 14 days window from today). I am
experimenting with fn:count and xdmp:estimate, e.g.:
let $count3D := fn:count( fn:collection()/example[State="TX" and ( xs:dateTime(
SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )])
or
let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and (
xs:dateTime( SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration(
"P14D") ) )])
Sure enough, the fn:count gives the correct answer but is rather slow, whereas
xdmp:estimate() is very fast but it appears to be only filtering the count by
state and completely ignores the dateTime-based criteria so it's grossly
incorrect.
Any advice on where I go from here - for either making fn:count() faster or
making xdmp:estimate() more accurate? Either creating some kind of index or
improving the query syntax or both?
I tried creating a range path index on example/SubmitDate path but it did not
seem to help anything so I am not sure I am on the right track - I'd appreciate
any clues or pointers on how to approach this correctly.
Thanks,
<image001.gif>
<image002.gif>
Alexei Betin
Principal Architect; Big Data
P: (817) 928-1643 | Elevate.com
4150 International Plaza, Suite 300
Fort Worth, TX 76109
Privileged and Confidential. This e-mail, and any attachments thereto, is
intended only for use by the addressee(s) named herein and may contain
privileged and/or confidential information. If you have received this e-mail in
error, please notify me immediately by a return e-mail and delete this e-mail.
You are hereby notified that any dissemination, distribution or copying of this
e-mail and/or any attachments thereto, is strictly prohibited.
_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general