Hi Dave,

Yes, I tried that too - this one did not make a lot of difference, and the 
reason is I think that while something like xs:dateTime( SubmitDate ) has to be 
evaluated for every document (and thus prevents the index from performing since 
the index is built on SubmitDate, not on xs:dateTime( SubmitDate )), the 
expression fn:current-dateTime() - xs:dayTimeDuration( "P14D") does not depend 
on individual document contents so MarkLogic query engine must be smart enough 
to evaluate it only once per query and then use the once computed value in 
comparing against the index.

Thanks,

~Alexei

----------------------------------------------------------------------

Message: 1
Date: Fri, 12 Dec 2014 19:27:03 +0000
From: Dave Cassel <[email protected]>
Subject: Re: [MarkLogic Dev General] efficiently limit documents count
        by dateTime field
To: MarkLogic Developer Discussion <[email protected]>
Message-ID: <d0b0aa5f.840de%[email protected]>
Content-Type: text/plain; charset="us-ascii"

Alexei,

I think it would be interesting to compare what you have now to:

let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D") let 
$count3D := xdmp:estimate( fn:collection()/example[State="TX" and ( SubmitDate  
> ( $fortnight ) )])

and
let $fortnight := fn:current-dateTime() - xs:dayTimeDuration( "P14D") let 
$count3D := xdmp:estimate(fn:collection/example,
  cts:and-query((
    cts:element-value-query(xs:QName("State"), "TX"),
    cts:element-range-query(xs:QName("SubmitDate"), ">", $fortnight)
  ))

Expressions in XPath predicates are evaluated for each potential match, so I 
suspect your code is doing the date math much more often than it needs to. 
(illustration<http://blog.davidcassel.net/2010/07/gotcha-sequence-index-evaluation/>)

--
Dave Cassel
Developer Community Manager
MarkLogic Corporation<http://www.marklogic.com/>
Cell:  +1-484-798-8720


From: Alexei Betin <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Friday, December 12, 2014 at 2:15 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] efficiently limit documents count by 
dateTime field

Thanks, Mike!

You were right about simplifying the predicate, in the end it turns out the 
major flaw was converting SubmitDate with xs:dateTime( SubmitDate ) - which for 
some reason I had thought was required, but it turns out I can just use plain 
SubmitDate in the predicate and that, along with creating a path index on 
SubmitDate, gave me just what I wanted:

let $count3D := fn:count( fn:collection()/example[State="TX" and ( SubmitDate > 
( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )]) let $count3D := 
xdmp:estimate( fn:collection()/example[State="TX" and ( SubmitDate  > ( 
fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )])

Now the index kicked in, and both fn:count() became a lot faster and 
xdmp:estimate() started giving the correct answer (same as count()) and is a 
lot faster still!

Thanks a lot for your response!

~Alexei Betin

On Thu Dec 11 15:39:34 PST 2014, Michael Blakeley <mike at blakeley.com> wrote:

You are on the right track: you'll want a dateTime element range index on 
SubmitDate. However I suspect the way the query is written is causing problems. 
You could check this using xdmp:plan.

The problem may be that you're doing too much work in an XPath predicate. It's 
easy to pretend that an XPath predicate acts like an array index, but its 
performance characteristics are different. Most importantly, the predicate 
expression can be evaluated many times: once for every item in the input. You 
can read more about this at http://blakeley.com/blogofile/archives/518/

To get around this, bind as much of the predicate as possible to a variable.

    let $fortnight := current-dateTime() - xs:dayTimeDuration("P14D")
    return xdmp:estimate(
      collection()/example[State="TX"][SubmitDate gt $fortnight])

Use xdmp:plan to verify that the right indexes are used.

Alternatively you could use a cts:search and cts:element-range-query instead of 
an XPath predicate.

-- Mike

On 11 Dec 2014, at 14:52 , Alexei Betin <ABetin at elevate.com> wrote:
Hi,

I am very new to both MarkLogic and xQuery and this is my first post here. My 
question is as follows:

I am trying to count documents that meet certain criteria and also fall into 
particular date range (such as within 14 days window from today). I am 
experimenting with fn:count and xdmp:estimate, e.g.:

let $count3D := fn:count( fn:collection()/example[State="TX" and ( xs:dateTime( 
SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration( "P14D") ) )])

or

let $count3D := xdmp:estimate( fn:collection()/example[State="TX" and ( 
xs:dateTime( SubmitDate ) > ( fn:current-dateTime() - xs:dayTimeDuration( 
"P14D") ) )])

Sure enough, the fn:count gives the correct answer but is rather slow, whereas 
xdmp:estimate() is very fast but it appears to be only filtering the count by 
state and completely ignores the dateTime-based criteria so it's grossly 
incorrect.

Any advice on where I go from here - for either making fn:count() faster or 
making xdmp:estimate() more accurate? Either creating some kind of index or 
improving the query syntax or both?

I tried creating a range path index on example/SubmitDate path but it did not 
seem to help anything so I am not sure I am on the right track - I'd appreciate 
any clues or pointers on how to approach this correctly.

Thanks,

<image001.gif>
<image002.gif>
Alexei Betin
Principal Architect; Big Data
P: (817) 928-1643 | Elevate.com
4150 International Plaza, Suite 300
Fort Worth, TX 76109

Privileged and Confidential. This e-mail, and any attachments thereto, is 
intended only for use by the addressee(s) named herein and may contain 
privileged and/or confidential information. If you have received this e-mail in 
error, please notify me immediately by a return e-mail and delete this e-mail. 
You are hereby notified that any dissemination, distribution or copying of this 
e-mail and/or any attachments thereto, is strictly prohibited.

_______________________________________________
General mailing list
General at developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://developer.marklogic.com/pipermail/general/attachments/20141212/63298027/attachment.html
 

------------------------------

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general


End of General Digest, Vol 126, Issue 24
****************************************
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to