You can do faceting with collections too (option #1). Each collection URI
would have the name and value embedded in it, e.g.: pub/web, pub/print,
etc.
Then you can use a collection constraint. Assuming "pub/" as the URI
prefix, you'd pass this in the options node for search:search():
<constraint name="pub">
<collection prefix="pub/">
</constraint>
The Search API then uses cts:uri-match("pub/*") under the covers for fast
retrieval of the facet values.
This is how we do faceted search on the Developer Community website, as
described here:
http://developer.marklogic.com/blog/collection-constraints-are-cool
Evan Lenz
Software Developer, Community
MarkLogic Corporation
http://developer.marklogic.com
On 1/16/12 9:03 AM, "Michael Blakeley" <[email protected]> wrote:
>If you think you might want to introduce a search facet on
>publication-id, then I would go with approach (3). Otherwise, (1) and (2)
>are cheaper and pretty much equivalent in performance.
>
>There is also option (4): use an element-value query, without a range
>index, and rely on the automatic element-value indexing. For your primary
>use case, the performance of (4) won't be significantly different from
>the other options, and it will use the least disk space and memory.
>
>Approach (4) only becomes unsuitable if you need to check thousands or
>millions of publication-id values in a single query. At that point each
>list-cache miss can drive an I/O read, which gets expensive for thousands
>or millions. With only five values, though, your list-cache misses on
>publication-id should be few and cheap.
>
>-- Mike
>
>On 16 Jan 2012, at 07:57 , David Sewell wrote:
>
>> We're developing a MarkLogic-based project where the data consists of
>>around
>> 100K XML documents. Each document belongs to one of 5 different
>>publications,
>> which need to be differentiated for certain searches. I'm aware of at
>>least
>> three methods of handling this differentiation:
>>
>> 1) assign each document to a collection and use cts:collection-query()
>>or
>> equivalent;
>>
>> 2) load documents into subdirectories, one to each publication, and use
>> cts:directory-query() or equivalent;
>>
>> 3) store publication identifier in the XML data as an element, then
>>create an
>> element range index to enable searches on it.
>>
>> Is there any way to guesstimate which of these approaches will have the
>>best
>> performance when combined with various word and element queries, or
>>will it
>> require empirical testing?
>>
>> David
>>
>> --
>> David Sewell, Editorial and Technical Manager
>> ROTUNDA, The University of Virginia Press
>> PO Box 400314, Charlottesville, VA 22904-4314 USA
>> Email: [email protected] Tel: +1 434 924 9973
>> Web: http://rotunda.upress.virginia.edu/
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general