[MarkLogic Dev General] RE: Searching large documents above the fragment level

Kelly Stirman Sun, 23 Aug 2009 13:00:04 -0700

Mattio,

I think the answer is to use range indexes in the same way you would to build 
faceted navigation on book title, author, etc. You can do this in different 
ways, depending on where you store the book metadata. The key point is that a 
range index will return a list of unique values for an element across *all* 
fragments in the database.


1) if you store each chapter as a document, and store the book metadata with 
each chapter, then your range index queries will work fine, but if you call 
each instance with cts:frequency, the counts would return chapter counts rather 
than book counts (personally, I feel this could be a very useful feature). You 
didn't say you were building faceted navigation, so you may not need the 
counts. 

2) if you store the books as fragmented documents, and store the metadata as 
properties, then I think you can build a range index on the properties, and 
return a list of book metadata that way. Why store the metadata a second time 
in the properties? As I explained in the prior note, we can "join" across the 
property fragment to the document fragments efficiently. So this approach has 
the advantage of allowing you to search across your metadata and any chapter 
fragment, while also performing faceted navigation on the metadata with 
book-based counts. On the other hand, it requires 4.1.

In either approach, you will be able to return an exhaustive list of metadata 
that matches your search, independent of the number of results you return to 
the application. Furthermore, these searches on the range indexes will be very 
efficient.

How does this work? You can constrain the values in a range index based on any 
cts:query. Note that this will only be accurate if your cts:query can be 
accurately resolved from the indexes, which in turn depends on the version of 
the server, your search features, and the indexes you have enabled. These days 
the majority of cts:query expressions can be accurately resolved from the 
indexes with the appropriate indexes enabled, so I hope you will be in pretty 
good shape.

Finally, the property-based approach is something I've never tried, but based 
on what I know it should work.

Kelly

PS - I feel like all my answers on this list use range indexes lately. I guess 
I'm addicted. :)


Message: 2
Date: Sat, 22 Aug 2009 21:55:37 -0400
From: Mattio Valentino <[email protected]>
Subject: Re: [MarkLogic Dev General] Searching large documents above
        thefragment root level.
To: General Mark Logic Developer Discussion
        <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

Thanks both Mike and Kelly.  I appreciate the responses, especially on the 
weekend!

The suggestions make perfect sense (and Mike knows I've worked with that 
chunking approach before).

One area I'm still not sure about is how to return a reference to the book if 
*any* fragment/chunk contains the user's search terms.  For example, if the 
user searches for "digestive system" I want to know which books contain the 
phrase anywhere within them.  The only idea I've had so far is to have the 
chunks loaded as you both describe, but then to have the book loaded again in 
it's entirety as a single document, but perhaps with the tagging and print 
index stripped out to help reduce its size.  We did both this and the chunking 
in a previous system.

Is there another approach I'm not seeing?

Thanks again,
Mattio
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

[MarkLogic Dev General] RE: Searching large documents above the fragment level

Reply via email to