Right, predicates only look like array indexes. The predicate expression will 
be evaluated for every item in the input sequence.

I touched on this at http://blakeley.com/blogofile/archives/518/

> Function calls inside an XPath predicate can be horrible for performance, 
> since the function must be called for every item in the predicate's input 
> sequence. If the result of the functioncall is static, simply bind the result 
> to a variable. This is also trueof operators: even simple math operations 
> like:
> $list[$start to $start + $size]
> should be rewritten as
> $list[$start to $stop]
> If you have trouble seeing why this might be a problem, consider a list with 
> 100 items. Now consider this expression:
> $list[ xdmp:sleep(100) ]
> Evaluation will cost 100-ms per item, or 10 seconds total. Every expression 
> takes a finite amount of time to evaluate, and performance optimization is 
> sometimes a matter of reducing the expression count.


-- Mike

On 3 Oct 2012, at 13:06 , Will Thompson wrote:

> Thanks for the explanation David. I thought it seemed odd, but this makes 
> sense.
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of David Lee
> Sent: Wednesday, October 03, 2012 12:54 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Unexpected behavior for xdmp:random() in 
> predicate
>  
> At first this seemed like a bug but profiling it makes more sense ...
> xdmp:random() is not a typcical xquery function as it  produces a different 
> value every call.
> ( unlike most xquery functions ... looking for the word "imdopedant" but cant 
> find it ).
>  
> Profiling these 2 expressions on my own DB which has /*:twitter elements
>  
> for $i in (1 to 10)
> let $idx := xdmp:random(1000)+1
> return count((//*:twitter)[$idx])
>  
> vs
>  
> for $i in (1 to 10)
> return count((//*:twitter)[xdmp:random(1000)+1])
>  
>  
> I get similar results as you.
>  
> But the profile shows all
> #1
>  
> Profile 52 ExpressionsPT6.13123S
> Module:Line No.:Col No.
> Count
> Shallow %
> Shallow µs
> Deep %
> Deep µs
> Expression
> .main:3:16
> 10
> 91
>  
> 5580242
> 91
>  
> 5580242
> fn:collection()/descendant::*:twitter
> .main:3:16
> 10
> 9
>  
> 549566
> 100
>  
> 6129808
> (fn:collection()/descendant::*:twitter)[$idx]
> .main:3:7
> 10
> 0.013
>  
> 795
> 100
>  
> 6130603
> fn:count((fn:collection()/descendant::*:twitter)[$idx])
> .main:1:0
> 1
> 0.0022
>  
> 134
> 100
>  
> 6130892
> for $i in 1 to 10 let $idx := xdmp:random(1000) + 1 return 
> fn:count((fn:collection()/descendant::*:twitter)[$idx])
> .main:2:12
> 10
> 0.0015
>  
> 89
> 0.0015
>  
> 89
> xdmp:random(1000)
> .main:2:29
> 10
> 0.001
>  
> 63
> 0.0025
>  
> 152
> xdmp:random(1000) + 1
> .main:1:13
> 1
> 0.000049
>  
> 3
> 0.000049
>  
> 3
> 1 to 10
> #2
>  
>  
> Profile 2134232 ExpressionsPT8.293776S
> Module:Line No.:Col No.
> Count
> Shallow %
> Shallow µs
> Deep %
> Deep µs
> Expression
> .main:3:16
> 10
> 68
>  
> 5680234
> 68
>  
> 5680234
> fn:collection()/descendant::*:twitter
> .main:3:7
> 10
> 13
>  
> 1109878
> 100
>  
> 8290044
> fn:count((fn:collection()/descendant::*:twitter)[xdmp:random(1000) + 1])
> .main:3:27
> 1067100
> 8.1
>  
> 668253
> 8.1
>  
> 668253
> xdmp:random(1000)
> .main:3:44
> 1067100
> 5.3
>  
> 438670
> 13
>  
> 1106923
> xdmp:random(1000) + 1
> .main:3:16
> 10
> 4.7
>  
> 393009
> 79
>  
> 6524821
> (fn:collection()/descendant::*:twitter)[xdmp:random(1000) + 1]
> .main:1:0
> 1
> 0.0012
>  
> 101
> 100
>  
> 8290147
> for $i in 1 to 10 return 
> fn:count((fn:collection()/descendant::*:twitter)[xdmp:random(1000) + 1])
> .main:1:13
> 1
> 0.000024
>  
> 2
> 0.000024
>  
> 2
> 1 to 10
>  
> ----------------
>  
> So in the first case,  xdmp:random gets called 10 times, and its value is 
> used to select a single element off that position.
>  
> In the second case xdmp:random gets called 1067100 times (10x# of docs)
> and the count is the number of times xdmp:random returned *the same value*  % 
> 1000 as the position of the document.
>  
> The above example is equivilent to :
>  
> for $i in (1 to 10)
>  
> return count((//*:twitter)[position() = xdmp:random(1000)+1])
>  
> which amounts to this really
>  
>    for $i in ( 1  to 10 )
>     return count(
>      for $t at $pos in //*:twitter
>          $pos eq xdmp:random(1000) + 1
>    )
>  
>  
> So this count is returning the number of times  xdmp:random(1000) + 1 returns 
> the same value as the position
>  
>  
>  
> -----------------------------------------------------------------------------
> David Lee
> Lead Engineer
> MarkLogic Corporation
> [email protected]
> Phone: +1 812-482-5224
> Cell:  +1 812-630-7622
> www.marklogic.com
>  
>  
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Will Thompson
> Sent: Wednesday, October 03, 2012 3:19 PM
> To: MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] Unexpected behavior for xdmp:random() in 
> predicate
>  
> When I use xdmp:random() like this, I expect this to always return 1, but 
> that's not the result.
>  
> for $i in (1 to 10)
> return count((//element)[xdmp:random(1000)+1])
> => 0 1 0 0 1 2 0 2 0 1
>  
> But when I assign the random value to a variable first, the output is as 
> expected and execution time is much faster.
>  
> for $i in (1 to 10)
> let $idx := xdmp:random(1000)+1
> return count((//element)[$idx])
> => 1 1 1 1 1 1 1 1 1 1
>  
> Is there an explanation for this?
>  
> Thanks,
>  
> Will
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to