Your guess is pretty close to accurate.

The XPath expression starting with doc() is a "searchable expression" and uses 
indexes to optimize its execution.

The XPath starting $doc is not searchable and can't use indexes, so it uses 
brute force search within the document (which means reading and searching 
needless fragments).  The optimizer doesn't (at least yet) realize that $doc is 
just a single document node that wasn't dynamically built meaning the 
expression could be turned into a searchable expression.

Interestingly, if your doc wasn't so heavily fragmented, you probably wouldn't 
see such a big difference.

-jh-

On Nov 17, 2009, at 12:47 PM, Lee, David wrote:

> I’ve run into something that is very surprising to me coming from a ‘xml 
> file’ processing mentality.
> When using MarkLogic I’ve discovered that when using a large document, even 
> with lots of fragments,  that storing a variable to the document
> is vastly slower then reusing the doc() function inline.   This is entirely 
> opposite to my expectations but does make some kind of sense ...
> My question is, is this an expectation ? or have I stumbled on something 
> truly weird and unexpected. Or a bug ?
>  
> Example: where this xml file is a 52 MB xml file but set with a fragment root 
> where each fragment is about 1k max.
>  
>  
> declare variable $id external ;
>  
> FAST
>  
> declare variable $c := 
> doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//conceptDef[id eq $id];
> declare variable $ns := 
> doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml")//namespaceDef[id eq 
> $c/namespace];
>  
> SLOW  - about 100x slower
>  
> declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml");
> declare variable $c := $doc//conceptDef[id eq $id];
> declare variable $ns := $doc//namespaceDef[id eq $c/namespace];
>  
>  
> I’m used to using the later convention ... but I did ‘expect’ MarkLogic to 
> optimize the two ways of doing things to be the same under the hood.
> My *guess* is that by doing  
>  
> declare variable $doc := doc("/NDFRT/NDFRT_Public_2009.05.12_TDE.xml");
>  
> Its loading the entire doc into memory, instead of keeping a reference to it 
> ...
> Whereas re-using the doc() within an xpath expression goes right to the 
> indexes.
>  
> Is this conclusion at all sane ?   I’m mainly curious as to writing future 
> queries if I should stick with the more verbose form and NOT set a variable 
> to big document.
>  
>  
>  
>  
>  
> ----------------------------------------
> David A. Lee
> Senior Principal Software Engineer
> Epocrates, Inc.
> [email protected]
> 812-482-5224
>  
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to