Re: [MarkLogic Dev General] collection() vs. xdmp:directory() vs. element range index performance

John Snelson Mon, 16 Jan 2012 09:02:54 -0800

On 16/01/12 15:57, David Sewell wrote:
> We're developing a MarkLogic-based project where the data consists of around
> 100K XML documents. Each document belongs to one of 5 different publications,
> which need to be differentiated for certain searches. I'm aware of at least
> three methods of handling this differentiation:
>
> 1) assign each document to a collection and use cts:collection-query() or
> equivalent;
>
> 2) load documents into subdirectories, one to each publication, and use
> cts:directory-query() or equivalent;
>
> 3) store publication identifier in the XML data as an element, then create an
> element range index to enable searches on it.
>
> Is there any way to guesstimate which of these approaches will have the best
> performance when combined with various word and element queries, or will it
> require empirical testing?


My guess is that the collection based method will be about as fast as 
the embedded XML identifier with an index if you're doing unfiltered 
search. Filtering might make the embedded XML ID slower. Using 
directories is probably a little slower than either of the previous two: 
You should note that directory creation has overheads associated with it 
- and can create a concurrency hotspot.

John

-- 
John Snelson, Senior Engineer                  http://twitter.com/jpcs
MarkLogic Corporation                         http://www.marklogic.com
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] collection() vs. xdmp:directory() vs. element range index performance

Reply via email to