Re: [MarkLogic Dev General] Well-factored and optimized XQuery

Michael Blakeley Thu, 03 Mar 2011 16:05:49 -0800

Evan,

Sometimes you have to make a choice between tidy code and fast code. In this 
case you can't easily avoid repeating code. You could probably build an XPath 
string and evaluate it using xdmp:value(), but that might add more complexity 
than it would be worth.


I touched on this point in the XQuery code review post at 
http://blakeley.com/wordpress/archives/518

> Review any function calls within XPath predicates
> Function calls inside an XPath predicate can be horrible for performance, 
> since the function must be called for every item in the predicate’s input 
> sequence. If the result of the function call is static, simply bind the 
> result to a variable. This is also true of operators: even simple math 
> operations like:
> 
> $list[$start to $start + $size]
> should be rewritten as
> 
> $list[$start to $stop]
> If you have trouble seeing why this might be a problem, consider a list with 
> 100 items. Now consider this expression:
> 
> $list[ xdmp:sleep(100) ]
> Evaluation will cost 100-ms per item, or 10 seconds total. Every expression 
> takes a finite amount of time to evaluate, and performance optimization is 
> sometimes a matter of reducing the expression count.


Those fn:not() calls aren't good for performance either. Consider inverting the 
test, using an attribute "public" instead of "private".

-- Mike 

On 3 Mar 2011, at 15:16 , Evan Lenz wrote:

> I'm wondering how to write well-factored and optimized XQuery. Here's an 
> excerpt from the code for RunDMC (on which developer.marklogic.com runs):
> 
> declare variable $Announcements := /Announcement[draft:listed(.)]; (: "News"  
>  :)
> declare variable $Events        := /Event       [draft:listed(.)]; (: 
> "Events" :)
> declare variable $Articles      := /Article     [draft:listed(.)]; (: "Learn" 
>  :)
> declare variable $Posts         := /Post        [draft:listed(.)]; (: "Blog"  
>  :)
> declare variable $Projects      := /Project     [draft:listed(.)]; (: "Code"  
>  :)
> 
> The problem with the above code is that it ignores the indexes. In fact, each 
> one of these expressions filters all the (some 5000, many of them not even 
> XML) fragments in the database. I made one addition to each line that helped 
> things quite a bit:
> 
> declare variable $Announcements := /Announcement/self::*[draft:listed(.)]; (: 
> "News"   :)
> 
> This makes the first step searchable, since node tests by themselves 
> ("Announcement") aren't allowed to be searchable, only full steps. By moving 
> the unsearchable predicate into a separate (subsequent) step, I've now 
> reduced the number of fragments to filter down to, say, 30 <Announcement> 
> docs instead of all ~5000 fragments. The addition of the extra "/self::*" is 
> not particularly pretty, but it's not terrible either, and it was a small 
> change with a huge positive impact (at least according to query-trace()).
> 
> But I'd like to do better. To make the paths fully searchable, I'll need to 
> pull my constraint out of my function call. Now I have:
> 
> declare variable $Announcements :=
>  if ($draft:public-docs-only) then 
> /Announcement[fn:not(@preview-only)][@status eq 'Published'] 
>                               else /Announcement[fn:not(@preview-only)];      
>                                         
> declare variable $Events        :=
>  if ($draft:public-docs-only) then /Event       
> [fn:not(@preview-only)][@status eq 'Published']                      
>                               else /Event       [fn:not(@preview-only)];      
>                                         
> declare variable $Articles      :=
>  if ($draft:public-docs-only) then /Article     
> [fn:not(@preview-only)][@status eq 'Published']                      
>                               else /Article     [fn:not(@preview-only)];      
>                                         
> declare variable $Posts         :=                                            
>                                         
>  if ($draft:public-docs-only) then /Post        
> [fn:not(@preview-only)][@status eq 'Published']                      
>                               else /Post        [fn:not(@preview-only)];      
>                                         
> declare variable $Projects      :=                                            
>                                         
>  if ($draft:public-docs-only) then /Project     
> [fn:not(@preview-only)][@status eq 'Published']                      
>                               else /Project     [fn:not(@preview-only)];      
>                           
> 
> Now, all of my XPath expressions are fully searchable, but things have gotten 
> messy and a bunch of code is duplicated. There are also cases where I'll want 
> to use range indexes to further constrain the results (such as "get the 
> latest two Announcements"), so this will likely only get worse, because as 
> far as I can tell, a path that references a variable is unsearchable, e.g., 
> $Announcements[date gt …]
> 
> Am I pushing index usage too far? Is there another way that I'm not seeing? I 
> assume I'm just not approaching things the correct "MarkLogic way". Any ideas 
> would be greatly appreciated.
> 
> Thanks,
> Evan
> 
> Evan Lenz
> Software Developer, Community
> MarkLogic Corporation
> 
> Phone +1 360 297 0087
> email  [email protected]
> web    developer.marklogic.com
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Well-factored *and* optimized XQuery

Reply via email to

Re: [MarkLogic Dev General] Well-factored and optimized XQuery