We're developing a MarkLogic-based project where the data consists of around 
100K XML documents. Each document belongs to one of 5 different publications, 
which need to be differentiated for certain searches. I'm aware of at least 
three methods of handling this differentiation:

1) assign each document to a collection and use cts:collection-query() or 
equivalent;

2) load documents into subdirectories, one to each publication, and use 
cts:directory-query() or equivalent;

3) store publication identifier in the XML data as an element, then create an 
element range index to enable searches on it.

Is there any way to guesstimate which of these approaches will have the best 
performance when combined with various word and element queries, or will it 
require empirical testing?

David

-- 
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 400314, Charlottesville, VA 22904-4314 USA
Email: [email protected]   Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to