I've run into something that is very surprising to me coming from a 'xml
file' processing mentality.
When using MarkLogic I've discovered that when using a large document,
even with lots of fragments, that storing a variable to the document
is vastly slower then reusing the doc() function inline. This is
entirely opposite to my expectations but does make some kind of sense
...
My question is, is this an expectation ? or have I stumbled on something
truly weird and unexpected. Or a bug ?
Example: where this xml file is a 52 MB xml file but set with a fragment
root where each fragment is about 1k max.
declare variable $id external ;
FAST
declare variable $c :=
doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//conceptDef[id eq $id];
declare variable $ns :=
doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//namespaceDef[id eq
$c/namespace];
SLOW - about 100x slower
declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml");
declare variable $c := $doc//conceptDef[id eq $id];
declare variable $ns := $doc//namespaceDef[id eq $c/namespace];
I'm used to using the later convention ... but I did 'expect' MarkLogic
to optimize the two ways of doing things to be the same under the hood.
My *guess* is that by doing
declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml");
Its loading the entire doc into memory, instead of keeping a reference
to it ...
Whereas re-using the doc() within an xpath expression goes right to the
indexes.
Is this conclusion at all sane ? I'm mainly curious as to writing
future queries if I should stick with the more verbose form and NOT set
a variable to big document.
----------------------------------------
David A. Lee
Senior Principal Software Engineer
Epocrates, Inc.
[email protected] <mailto:[email protected]>
812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general