Mattio, I think the answer is to use range indexes in the same way you would to build faceted navigation on book title, author, etc. You can do this in different ways, depending on where you store the book metadata. The key point is that a range index will return a list of unique values for an element across *all* fragments in the database.
1) if you store each chapter as a document, and store the book metadata with each chapter, then your range index queries will work fine, but if you call each instance with cts:frequency, the counts would return chapter counts rather than book counts (personally, I feel this could be a very useful feature). You didn't say you were building faceted navigation, so you may not need the counts. 2) if you store the books as fragmented documents, and store the metadata as properties, then I think you can build a range index on the properties, and return a list of book metadata that way. Why store the metadata a second time in the properties? As I explained in the prior note, we can "join" across the property fragment to the document fragments efficiently. So this approach has the advantage of allowing you to search across your metadata and any chapter fragment, while also performing faceted navigation on the metadata with book-based counts. On the other hand, it requires 4.1. In either approach, you will be able to return an exhaustive list of metadata that matches your search, independent of the number of results you return to the application. Furthermore, these searches on the range indexes will be very efficient. How does this work? You can constrain the values in a range index based on any cts:query. Note that this will only be accurate if your cts:query can be accurately resolved from the indexes, which in turn depends on the version of the server, your search features, and the indexes you have enabled. These days the majority of cts:query expressions can be accurately resolved from the indexes with the appropriate indexes enabled, so I hope you will be in pretty good shape. Finally, the property-based approach is something I've never tried, but based on what I know it should work. Kelly PS - I feel like all my answers on this list use range indexes lately. I guess I'm addicted. :) Message: 2 Date: Sat, 22 Aug 2009 21:55:37 -0400 From: Mattio Valentino <[email protected]> Subject: Re: [MarkLogic Dev General] Searching large documents above thefragment root level. To: General Mark Logic Developer Discussion <[email protected]> Message-ID: <[email protected]> Content-Type: text/plain; charset=ISO-8859-1 Thanks both Mike and Kelly. I appreciate the responses, especially on the weekend! The suggestions make perfect sense (and Mike knows I've worked with that chunking approach before). One area I'm still not sure about is how to return a reference to the book if *any* fragment/chunk contains the user's search terms. For example, if the user searches for "digestive system" I want to know which books contain the phrase anywhere within them. The only idea I've had so far is to have the chunks loaded as you both describe, but then to have the book loaded again in it's entirety as a single document, but perhaps with the tagging and print index stripped out to help reduce its size. We did both this and the chunking in a previous system. Is there another approach I'm not seeing? Thanks again, Mattio _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
