Thanks, Michael and John, for the thorough and useful responses, David S.
On Mon, 16 Jan 2012, Michael Blakeley wrote: > If you think you might want to introduce a search facet on publication-id, > then I would go with approach (3). Otherwise, (1) and (2) are cheaper and > pretty much equivalent in performance. > > There is also option (4): use an element-value query, without a range index, > and rely on the automatic element-value indexing. For your primary use case, > the performance of (4) won't be significantly different from the other > options, and it will use the least disk space and memory. > > Approach (4) only becomes unsuitable if you need to check thousands or > millions of publication-id values in a single query. At that point each > list-cache miss can drive an I/O read, which gets expensive for thousands or > millions. With only five values, though, your list-cache misses on > publication-id should be few and cheap. > > -- Mike > > On 16 Jan 2012, at 07:57 , David Sewell wrote: > >> We're developing a MarkLogic-based project where the data consists of around >> 100K XML documents. Each document belongs to one of 5 different publications, >> which need to be differentiated for certain searches. I'm aware of at least >> three methods of handling this differentiation: >> >> 1) assign each document to a collection and use cts:collection-query() or >> equivalent; >> >> 2) load documents into subdirectories, one to each publication, and use >> cts:directory-query() or equivalent; >> >> 3) store publication identifier in the XML data as an element, then create an >> element range index to enable searches on it. >> >> Is there any way to guesstimate which of these approaches will have the best >> performance when combined with various word and element queries, or will it >> require empirical testing? >> >> David >> >> -- >> David Sewell, Editorial and Technical Manager >> ROTUNDA, The University of Virginia Press >> PO Box 400314, Charlottesville, VA 22904-4314 USA >> Email: [email protected] Tel: +1 434 924 9973 >> Web: http://rotunda.upress.virginia.edu/ >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > -- David Sewell, Editorial and Technical Manager ROTUNDA, The University of Virginia Press PO Box 400314, Charlottesville, VA 22904-4314 USA Email: [email protected] Tel: +1 434 924 9973 Web: http://rotunda.upress.virginia.edu/ _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
