Hi Dave, Yes, I tried that too - this one did not make a lot of difference, and the reason is I think that while something like xs:dateTime( SubmitDate ) has to be evaluated for every document (and thus prevents the index from performing since the index is built on SubmitDate, not on xs:dateTime( SubmitDate )), the expression fn:current-dateTime() - xs:dayTimeDuration( "P14D") does not depend on individual document contents so MarkLogic query engine must be smart enough to evaluate it only once per query and then use the once computed value in comparing against the index.
Thanks, ~Alexei ---------------------------------------------------------------------- Message: 1 Date: Fri, 12 Dec 2014 19:27:03 +0000 From: Dave Cassel <[email protected]> Subject: Re: [MarkLogic Dev General] efficiently limit documents count by dateTime field To: MarkLogic Developer Discussion <[email protected]> Message-ID: <d0b0aa5f.840de%[email protected]> Content-Type: text/plain; charset="us-ascii" Alexei, I think it would be interesting to compare what you have now to: let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D") let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and ( SubmitDate > ( $fortnight ) )]) and let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D") let $count3D := xdmp:estimate(fn:collection/example, cts:and-query(( cts:element-value-query(xs:QName("State"), "TX"), cts:element-range-query(xs:QName("SubmitDate"), ">", $fortnight) )) Expressions in XPath predicates are evaluated for each potential match, so I suspect your code is doing the date math much more often than it needs to. (illustration<http://blog.davidcassel.net/2010/07/gotcha-sequence-index-evaluation/>) -- Dave Cassel Developer Community Manager MarkLogic Corporation<http://www.marklogic.com/> Cell: +1-484-798-8720 From: Alexei Betin <[email protected]<mailto:[email protected]>> Reply-To: MarkLogic Developer Discussion <[email protected]<mailto:[email protected]>> Date: Friday, December 12, 2014 at 2:15 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: [MarkLogic Dev General] efficiently limit documents count by dateTime field Thanks, Mike! You were right about simplifying the predicate, in the end it turns out the major flaw was converting SubmitDate with xs:dateTime( SubmitDate ) - which for some reason I had thought was required, but it turns out I can just use plain SubmitDate in the predicate and that, along with creating a path index on SubmitDate, gave me just what I wanted: let $count3D := fn:count( fn:collection()/example[State="TX" and ( SubmitDate > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )]) let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and ( SubmitDate > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )]) Now the index kicked in, and both fn:count() became a lot faster and xdmp:estimate() started giving the correct answer (same as count()) and is a lot faster still! Thanks a lot for your response! ~Alexei Betin On Thu Dec 11 15:39:34 PST 2014, Michael Blakeley <mike at blakeley.com> wrote: You are on the right track: you'll want a dateTime element range index on SubmitDate. However I suspect the way the query is written is causing problems. You could check this using xdmp:plan. The problem may be that you're doing too much work in an XPath predicate. It's easy to pretend that an XPath predicate acts like an array index, but its performance characteristics are different. Most importantly, the predicate expression can be evaluated many times: once for every item in the input. You can read more about this at http://blakeley.com/blogofile/archives/518/ To get around this, bind as much of the predicate as possible to a variable. let $fortnight := current-dateTime() - xs:dayTimeDuration("P14D") return xdmp:estimate( collection()/example[State="TX"][SubmitDate gt $fortnight]) Use xdmp:plan to verify that the right indexes are used. Alternatively you could use a cts:search and cts:element-range-query instead of an XPath predicate. -- Mike On 11 Dec 2014, at 14:52 , Alexei Betin <ABetin at elevate.com> wrote: Hi, I am very new to both MarkLogic and xQuery and this is my first post here. My question is as follows: I am trying to count documents that meet certain criteria and also fall into particular date range (such as within 14 days window from today). I am experimenting with fn:count and xdmp:estimate, e.g.: let $count3D := fn:count( fn:collection()/example[State="TX" and ( xs:dateTime( SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )]) or let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and ( xs:dateTime( SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )]) Sure enough, the fn:count gives the correct answer but is rather slow, whereas xdmp:estimate() is very fast but it appears to be only filtering the count by state and completely ignores the dateTime-based criteria so it's grossly incorrect. Any advice on where I go from here - for either making fn:count() faster or making xdmp:estimate() more accurate? Either creating some kind of index or improving the query syntax or both? I tried creating a range path index on example/SubmitDate path but it did not seem to help anything so I am not sure I am on the right track - I'd appreciate any clues or pointers on how to approach this correctly. Thanks, <image001.gif> <image002.gif> Alexei Betin Principal Architect; Big Data P: (817) 928-1643 | Elevate.com 4150 International Plaza, Suite 300 Fort Worth, TX 76109 Privileged and Confidential. This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain privileged and/or confidential information. If you have received this e-mail in error, please notify me immediately by a return e-mail and delete this e-mail. You are hereby notified that any dissemination, distribution or copying of this e-mail and/or any attachments thereto, is strictly prohibited. _______________________________________________ General mailing list General at developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general -------------- next part -------------- An HTML attachment was scrubbed... URL: http://developer.marklogic.com/pipermail/general/attachments/20141212/63298027/attachment.html ------------------------------ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general End of General Digest, Vol 126, Issue 24 **************************************** _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
