Re: [MarkLogic Dev General] derferencing documents with document-uri and base-uri?

David Lee Tue, 22 Oct 2013 06:47:14 -0700

The document URI is normatively stored in the disk block with the document data 
and properties so it does
require loading the document into memory to get its URI ... providing you are 
referencing it with a document node.

If the document is pulled into memory for the sole purpose of getting its URI 
it can be slow.
To test this I have a DB with 1.6mil tweets ...
Even after trying it once , these calls are slow:

on my system

count( doc()/fn:base-uri() )               1min 25 sec
count( doc()/fn:document-uri() )     1min 26 secs
count( doc()/xdmp:node-uri(.) )      1min 22 secs

But if all you want are URI's consider the uri lexicon.    This lexicon is 
stored separately from the document and all together
so iterating through all the URI's is much faster.
Even without using the advanced filtering functions this can be fast

count( cts:uris() )                      0.36 seconds

if you are dealing with billions of docs instead of a million then you should 
definately use the advanced options for this call
to retrieve only the URI's that you want.

If the document is already in memory, fetching its URI is fast  (and I dont 
know another way but using one of the above xxx-uri() methods).

-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>

From: [email protected] 
[mailto:[email protected]] On Behalf Of anoop raj p
Sent: Tuesday, October 22, 2013 6:15 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] derferencing documents with document-uri 
and base-uri?

Please remove me from email list.

On Tue, Oct 22, 2013 at 3:44 PM, Rachel Wilson 
<[email protected]<mailto:[email protected]>> wrote:
I didn't think it was a problem as such, I wasn't trying to prematurely
optimise I promise but I was curious about the workings under the hood
since we use these functions a lot including our slower running queries -
investigating those is how this question came up.    Think about this as
settling a bet ;)

So, I"m still curious - what is dereferencing? is that indeed what
happens?

Say we have a a database node returned from a query, which isn't the
document node, and we call base-uri on it, would the whole document itself
necessarily have been put in the expanded tree cache in order to resolve
the query?  I'm still learning about the roles of the different caches and
its turning out to be very helpful to know.

PS.  We don't have subfragments

-----Original Message-----
From: Michael Blakeley <[email protected]<mailto:[email protected]>>
Reply-To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Date: Monday, 21 October 2013 18:39
To: MarkLogic Developer Discussion 
<[email protected]<mailto:[email protected]>>
Subject: Re: [MarkLogic Dev General] derferencing documents
with    document-uri and base-uri?

I wouldn't worry about it unless it's clearly a problem: avoid premature
optimization. If you have a database node in memory, then it's in the
expanded tree cache. So repeated accessor calls for its URI can drive
cache lookups and CPU cycles, but should never result in cache misses.
Check the xdmp:query-meters output to see this for yourself: you should be
able to correlate the number of URI accesses to the
expanded-tree-cache-hit count.

Things might get a little more expensive if you have subfragments, because
crossing fragment boundaries can be expensive. A call to base-uri inside
subfragment might have to traverse to the parent fragment - or maybe not,
I'd have to design a test to say for certain. But the time to worry is
when you have a performance problem, and your test case shows the URI
accessor in the profiler output. Then you could think about ways to
minimize URI lookups.

Switching to functionality, I almost always use xdmp:node-uri rather than
document-uri or base-uri. I avoid document-uri simply because I don't want
to worry about traversing to root for document-uri, and base-uri because I
don't want the behavior where an ancestor element specifies its own
base-uri value. That's rare in most XML, but base-uri checks for it and
honors it. Checking for that probably slows things down a bit, and
honoring it generally doesn't do what I want. So I always use
xdmp:node-uri instead.

-- Mike

On 21 Oct 2013, at 09:54 , Rachel Wilson 
<[email protected]<mailto:[email protected]>> wrote:

>
> I have heard on the grapevine that to use document-uri() or base-uri()
>functions is bad for performance, although I can't seem to find anything
>about that in MarkLogic's docs or elsewhere on the internet.  One of the
>reasons given was that using those functions "dereference the document",
>or that MarkLogic Server has to go to disk to resolve the uri.  Although
>I'm not sure what is really meant by "dereference"
>
> Could someone clear this up.  Has the grapevine got the wrong end of the
>stick or is it perhaps how the function is used, perhaps in loops, that
>is the reason behind this thinking?  We use those two functions so much,
>particularly base-uri(), in our code that we would consider some rewrites
>if it really is something to minimise.
>
> Many thanks,
> Rachel
>
>
>
> ----------------------------
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain
>personal views which are not the views of the BBC unless specifically
>stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in
>reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
>
> ---------------------
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

-----------------------------
http://www.bbc.co.uk
This e-mail (and any attachments) is confidential and
may contain personal views which are not the views of the BBC unless 
specifically stated.
If you have received it in
error, please delete it from your system.
Do not use, copy or disclose the
information in any way nor act in reliance on it and notify the sender
immediately.
Please note that the BBC monitors e-mails
sent or received.
Further communication will signify your consent to
this.
-----------------------------
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

--
anoop raj p

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] derferencing documents with document-uri and base-uri?

Reply via email to