If you think you might want to introduce a search facet on publication-id, then I would go with approach (3). Otherwise, (1) and (2) are cheaper and pretty much equivalent in performance.
There is also option (4): use an element-value query, without a range index, and rely on the automatic element-value indexing. For your primary use case, the performance of (4) won't be significantly different from the other options, and it will use the least disk space and memory. Approach (4) only becomes unsuitable if you need to check thousands or millions of publication-id values in a single query. At that point each list-cache miss can drive an I/O read, which gets expensive for thousands or millions. With only five values, though, your list-cache misses on publication-id should be few and cheap. -- Mike On 16 Jan 2012, at 07:57 , David Sewell wrote: > We're developing a MarkLogic-based project where the data consists of around > 100K XML documents. Each document belongs to one of 5 different publications, > which need to be differentiated for certain searches. I'm aware of at least > three methods of handling this differentiation: > > 1) assign each document to a collection and use cts:collection-query() or > equivalent; > > 2) load documents into subdirectories, one to each publication, and use > cts:directory-query() or equivalent; > > 3) store publication identifier in the XML data as an element, then create an > element range index to enable searches on it. > > Is there any way to guesstimate which of these approaches will have the best > performance when combined with various word and element queries, or will it > require empirical testing? > > David > > -- > David Sewell, Editorial and Technical Manager > ROTUNDA, The University of Virginia Press > PO Box 400314, Charlottesville, VA 22904-4314 USA > Email: [email protected] Tel: +1 434 924 9973 > Web: http://rotunda.upress.virginia.edu/ > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
