I've run into something that is very surprising to me coming from a 'xml
file' processing mentality.

When using MarkLogic I've discovered that when using a large document,
even with lots of fragments,  that storing a variable to the document

is vastly slower then reusing the doc() function inline.   This is
entirely opposite to my expectations but does make some kind of sense
... 

My question is, is this an expectation ? or have I stumbled on something
truly weird and unexpected. Or a bug ?

 

Example: where this xml file is a 52 MB xml file but set with a fragment
root where each fragment is about 1k max.

 

 

declare variable $id external ;

 

FAST 

 

declare variable $c :=
doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//conceptDef[id eq $id];
declare variable $ns :=
doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//namespaceDef[id eq
$c/namespace];

 

SLOW  - about 100x slower

 

declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml"); 

declare variable $c := $doc//conceptDef[id eq $id];
declare variable $ns := $doc//namespaceDef[id eq $c/namespace];

 

 

I'm used to using the later convention ... but I did 'expect' MarkLogic
to optimize the two ways of doing things to be the same under the hood.

My *guess* is that by doing  

 

declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml"); 

 

Its loading the entire doc into memory, instead of keeping a reference
to it ... 

Whereas re-using the doc() within an xpath expression goes right to the
indexes.

 

Is this conclusion at all sane ?   I'm mainly curious as to writing
future queries if I should stick with the more verbose form and NOT set
a variable to big document.

 

 

 

 

 

----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

[email protected] <mailto:[email protected]> 

812-482-5224

 

 

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to