Mike,
Thanks for the insights and the article link. I liked your idea of using
xdmp:value(). In XSLT, I sometimes declare entities in the internal DTD subset
and use that as a macro mechanism for defining reusable match patterns. I know
xdmp:value() does dynamic evaluation rather than static macro substitution, but
it will do for my purposes.
Here's what I came up with using xdmp:value():
declare variable $Announcements := docs('Announcement');
declare variable $Events := docs('Event');
declare variable $Articles := docs('Article');
declare variable $Posts := docs('Post');
declare variable $Projects := docs('Project');
declare function docs($element-name) {
let $expr := fn:concat("if ($draft:public-docs-only)",
"then
/",$element-name,"[fn:not(@preview-only)][@status eq 'Published']",
"else /",$element-name,"[fn:not(@preview-only)]"
)
return
xdmp:value($expr)
};
Since I want to extend this further (making use of range indexes, for example),
I'll probably pull xdmp:value() out of the docs() function so that docs() just
returns the expression part (a string). This is a different model than I'm used
to and I'm sure there are more things to consider, but I like knowing that I
have this option. Basically: building up expressions rather than values opens
up one possible way to exploit indexes without duplicating code.
Evan Lenz
Software Developer, Community
MarkLogic Corporation
On 3/3/11 4:05 PM, "Michael Blakeley"
<[email protected]<mailto:[email protected]>> wrote:
Evan,
Sometimes you have to make a choice between tidy code and fast code. In this
case you can't easily avoid repeating code. You could probably build an XPath
string and evaluate it using xdmp:value(), but that might add more complexity
than it would be worth.
I touched on this point in the XQuery code review post at
http://blakeley.com/wordpress/archives/518
Review any function calls within XPath predicates
Function calls inside an XPath predicate can be horrible for performance, since
the function must be called for every item in the predicate’s input sequence.
If the result of the function call is static, simply bind the result to a
variable. This is also true of operators: even simple math operations like:
$list[$start to $start + $size]
should be rewritten as
$list[$start to $stop]
If you have trouble seeing why this might be a problem, consider a list with
100 items. Now consider this expression:
$list[ xdmp:sleep(100) ]
Evaluation will cost 100-ms per item, or 10 seconds total. Every expression
takes a finite amount of time to evaluate, and performance optimization is
sometimes a matter of reducing the expression count.
Those fn:not() calls aren't good for performance either. Consider inverting the
test, using an attribute "public" instead of "private".
-- Mike
On 3 Mar 2011, at 15:16 , Evan Lenz wrote:
I'm wondering how to write well-factored and optimized XQuery. Here's an
excerpt from the code for RunDMC (on which developer.marklogic.com runs):
declare variable $Announcements := /Announcement[draft:listed(.)]; (: "News"
:)
declare variable $Events := /Event [draft:listed(.)]; (: "Events"
:)
declare variable $Articles := /Article [draft:listed(.)]; (: "Learn"
:)
declare variable $Posts := /Post [draft:listed(.)]; (: "Blog"
:)
declare variable $Projects := /Project [draft:listed(.)]; (: "Code"
:)
The problem with the above code is that it ignores the indexes. In fact, each
one of these expressions filters all the (some 5000, many of them not even XML)
fragments in the database. I made one addition to each line that helped things
quite a bit:
declare variable $Announcements := /Announcement/self::*[draft:listed(.)]; (:
"News" :)
This makes the first step searchable, since node tests by themselves
("Announcement") aren't allowed to be searchable, only full steps. By moving
the unsearchable predicate into a separate (subsequent) step, I've now reduced
the number of fragments to filter down to, say, 30 <Announcement> docs instead
of all ~5000 fragments. The addition of the extra "/self::*" is not
particularly pretty, but it's not terrible either, and it was a small change
with a huge positive impact (at least according to query-trace()).
But I'd like to do better. To make the paths fully searchable, I'll need to
pull my constraint out of my function call. Now I have:
declare variable $Announcements :=
if ($draft:public-docs-only) then
/Announcement[fn:not(@preview-only)][@status eq 'Published']
else /Announcement[fn:not(@preview-only)];
declare variable $Events :=
if ($draft:public-docs-only) then /Event
[fn:not(@preview-only)][@status eq 'Published']
else /Event [fn:not(@preview-only)];
declare variable $Articles :=
if ($draft:public-docs-only) then /Article
[fn:not(@preview-only)][@status eq 'Published']
else /Article [fn:not(@preview-only)];
declare variable $Posts :=
if ($draft:public-docs-only) then /Post
[fn:not(@preview-only)][@status eq 'Published']
else /Post [fn:not(@preview-only)];
declare variable $Projects :=
if ($draft:public-docs-only) then /Project
[fn:not(@preview-only)][@status eq 'Published']
else /Project [fn:not(@preview-only)];
Now, all of my XPath expressions are fully searchable, but things have gotten
messy and a bunch of code is duplicated. There are also cases where I'll want
to use range indexes to further constrain the results (such as "get the latest
two Announcements"), so this will likely only get worse, because as far as I
can tell, a path that references a variable is unsearchable, e.g.,
$Announcements[date gt …]
Am I pushing index usage too far? Is there another way that I'm not seeing? I
assume I'm just not approaching things the correct "MarkLogic way". Any ideas
would be greatly appreciated.
Thanks,
Evan
Evan Lenz
Software Developer, Community
MarkLogic Corporation
Phone +1 360 297 0087
email [email protected]<mailto:[email protected]>
web developer.marklogic.com
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general