Mattio,

Searches return fragments, and the amount and number of fragments returned by a 
search can determine performance in many cases, especially where disc i/o is 
the bottleneck. It sounds as if the queries you are issuing are perhaps 
returning the entire document rather than the parent fragment that links to 
your chapter fragments, but it is difficult to know based on the information 
you've provided. I *think* there is a way to return only the parent fragment in 
your searches, but I confess that I don't know off the top of my head (perhaps 
a colleague will remind us). Frankly, we discourage from using fragments 
because they add a layer of complexity to most applications. We of course 
support them, and they are very useful in some cases, but typically we try to 
stay away from them. Mike S. suggested what most people do, which is to break 
the books into chapter documents, and to store the book metadata on each 
chapter so that you can search metadata + the contents of each chapter at the 
same time.

If you keep each book as a single document, one option you might consider is to 
use properties, which themselves are a special type of fragment. Fortunately, 
properties are indexed per your database settings, so you could issue the same 
search, but instead of querying the document, you can query the properties. 
There are a few ways to do this - with xdmp:directory-properties() or 
xdmp:collection-properties() if you have your documents in collections or 
directories, and in 4.1 you have cts:property-query().

The advantage of the property approach is that in 4.1 you can "join" across 
your property fragment and your chapter fragments in your searches. So, you 
could search your chapters and your front matter (eg, title, author, pub-year, 
etc) in the same search. I have to imagine that this will become a requirement 
at some point. :-)

Kelly


Message: 3
Date: Fri, 21 Aug 2009 20:58:52 -0400
From: Mattio Valentino <[email protected]>
Subject: [MarkLogic Dev General] Searching large documents above the
        fragment root level.
To: General Mark Logic Developer Discussion
        <[email protected]>
Message-ID:
        <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

I have large documents stored in MarkLogic -- books.  My fragment
roots are set to the chapter level because we display material at that
level and we have a search feature at that level. Performance is good
with those queries.

We also have a feature where we want to search at the title level
where title metadata is returned as a result if it contains the search
term anywhere within it.

I've written this query a number of different ways and I can't get
good performance out of it.  There are a number of requirements I'm
leaving out, but does anyone have a pattern or general strategy for
these types of queries where you are searching at the document level
instead of the fragment root level?

Thanks,
Mattio
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to