Sweet. That worked. I re-created the XML with categoryId as an element
underneath <category> instead of as an attribute. 

Then used the paginated search to retrieve the records to display on the
page and the following to set up the filtered (faceted) category search:

cts:element-values(xs:QName('category'), (), (), $searchQuery)

This last search returned all of the child elements under <category>,
which is all I needed in order to set up the display of the page. Pretty
darn simple once I knew what to do.

MarkLogic went from taking twice as long as SQL Server to being 3-4
times faster.

Thanks for your help, Mike! Much appreciated.

-Grant

> -----Original Message-----
> Michael Blakeley
> Sent: Tuesday, December 30, 2008 10:20 AM

> 
> Grant,
> 
> Your query-trace output looks normal to me. I didn't expect 
> to see anything unusual there, but it's a good tool to know about.
> 
> Your category display sounds similar to a feature called 
> facets, or guided navigation. If so, you can avoid retrieving 
> all the results. Try
> cts:element-values() or cts:element-attribute-values() with 
> the original user query, and a range index on the appropriate 
> element or element-attribute pair.
> 
> For example, I think your categories could be represented 
> with an attribute range index on category/@categoryId as 
> integer. In that case the heart of the query might be 
> something like this:
> 
> let $query := cts:field-word-query("FullText", "president") 
> let $page := (
>    cts:search(/entry, $query, 'unfiltered')
> )[1 to 10]
> let $facets :=
>    for $v in cts:element-attribute-values(
>      xs:QName('category'), xs:QName('categoryId'),
>      (), ('frequency-order', 'type=integer'),
>      $query)
>    return element category {
>      attribute frequency { cts:frequency($v) },
>      $v
>    }
> return element results {
>    attribute remainder {
>      if ($page[1]) then cts:remainder($page[1]) else 0 },
>    element query { $query },
>    $page,
>    $facets
> }
> 
> You could also add a lookup step to get the longDescription 
> for each value, possibly using map:map(). Or that might be 
> better handled in your display layer.
> 
> thanks,
> -- Mike
> 
> On 2008-12-30 08:59, Grant Lindley wrote:
> > Thanks for your suggestions, Mike. See below.
> >
> >> I strongly recommend pagination in your query: see
> >>
> > 
> http://developer.marklogic.com/howto/tutorials/2006-09-paginat
> ed-search.
> > xqy
> >
> > This greatly increases the performance, but there is a hitch. In my 
> > case, there is a special requirement for the search results 
> page that 
> > all of the categories that have at least one matching 
> record are to be 
> > displayed. (Categories are things like map, image, biography, etc.)
> >
> > I think this means that I have to loop through all matching 
> records in 
> > order to grab all of the matched categories... unless there 
> is a way 
> > to craft a fast search that only pulls out the categories. Then I 
> > could combine the fast category search with the fast 
> paginated search. 
> > I'll explore that some more.
> >
> >> As well as xdmp:query-meters(), you should consult
> >> xdmp:query-trace() - see
> >> http://developer.marklogic.com/pubs/4.0/books/performance.pdf
> >
> > Here's the output from query-meters() and query-trace(). I 
> didn't see 
> > anything, except I'm not sure what the value of 
> the<qm:elapsed-time> 
> > element means. (The search took approximately 5 seconds to return.)
> >
> > /eval line 1: Analyzing path for search: doc() /eval line 
> 1: Step 1 is 
> > searchable: doc() /eval line 1: Path is fully searchable.
> > /eval line 1: Gathering constraints.
> > /eval line 1: Search query contributed 1 constraint:
> > cts:field-word-query("FullText", "president", ("lang=en"), 1) /eval 
> > line 1: Executing search.
> > /eval line 1: Selected 4090 fragments
> > <qm:query-meters
> > xsi:schemaLocation="http://marklogic.com/xdmp/query-meters
> > query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters";
> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
> > <qm:elapsed-time>PT0S</qm:elapsed-time>
> > <qm:requests>1</qm:requests>
> > <qm:list-cache-hits>4</qm:list-cache-hits>
> > <qm:list-cache-misses>0</qm:list-cache-misses>
> > <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
> > <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
> > 
> <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hit
> > s> <qm:value-cache-hits>0</qm:value-cache-hits>
> > <qm:value-cache-misses>0</qm:value-cache-misses>
> > <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
> > <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
> > <qm:link-cache-hits>0</qm:link-cache-hits>
> > <qm:link-cache-misses>0</qm:link-cache-misses>
> > <qm:fragments-added>0</qm:fragments-added>
> > <qm:fragments-deleted>0</qm:fragments-deleted>
> > <qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
> > <qm:fs-program-cache-misses>0</qm:fs-program-cache-misses>
> > <qm:db-program-cache-hits>0</qm:db-program-cache-hits>
> > <qm:db-program-cache-misses>0</qm:db-program-cache-misses>
> > 
> <qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-c
> > ac
> > he-hits>
> > 
> <qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence
> > -c
> > ache-misses>
> > 
> <qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-c
> > ac
> > he-hits>
> > 
> <qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence
> > -c
> > ache-misses>
> > <qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
> > 
> <qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misse
> > s> 
> > <qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
> > 
> <qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misse
> > s>
> > <qm:fragments>
> > <qm:fragment>
> > <qm:root xmlns="">entry</qm:root>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > </qm:fragment>
> > </qm:fragments>
> > <qm:documents>
> > <qm:document>
> > <qm:uri>/C/TEMP/EBookDump/436672.xml</qm:uri>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > </qm:document>
> > </qm:documents>
> >
> >> Looking at the structure of your documents, I'd try storing each 
> >> entry as a separate document. So your search would become /entry 
> >> rather than/content/entry.
> >
> > I removed the content element and created and loaded a separate 
> > document for each record. This didn't change the 
> performance, however.
> >
> >
> >> On 2008-12-29 11:58, Grant Lindley wrote:
> >>> I'm comparing full-text search performance between
> >> MarkLogic 4.0 and
> >>> SQL
> >>> Server2005 from a C# .NET web page.
> >>>
> >>> So far searches take about twice as long in MarkLogic
> >> compared to SQL
> >>> Server, and I'm looking for suggestions to improve
> >> performance in ML.
> >>> The test data consists of 14,035 searchable records that
> >> take up 52 MB
> >>> in an XML text file.
> >>>
> >>> Here's a sample record:
> >>>
> >>> <content>
> >>>     <entry entryId="121866">
> >>>       <title>Alvar Aalto</title>
> >>>       <sortTitle>Aalto, Alvar</sortTitle>
> >>>       <searchTitle>Aalto, Alvar</searchTitle>
> >>>       <synopsis>Finland's most distinguished designer, Alvar
> >> Aalto is
> >>> renowned for his building designs as well as for his unique
> >> birchwood
> >>> furniture designs that are the archetype of Finnish furniture.
> >>> </synopsis>
> >>>       <mainText>   Finland's most distinguished architect and
> >> designer, ...
> >>> [long text removed]</mainText>
> >>>       <entryDate></entryDate>
> >>>       <searchExclude>False</searchExclude>
> >>>       <hyperlink>False</hyperlink>
> >>>       <furtherReading>Alvar Aalto Museum Web Site 
> >>> (http://www.alvaraalto.fi)</furtherReading>
> >>>       <siteCredits>ABC-CLIO</siteCredits>
> >>>       <citationCredits></citationCredits>
> >>>       <citationCredits2></citationCredits2>
> >>>       <accentUpdated>True</accentUpdated>
> >>>       <category categoryId="22">
> >>>         <displayTitle>Individuals</displayTitle>
> >>>         <formOrder>30</formOrder>
> >>>         <filterable>True</filterable>
> >>>         <categoryTypeId>5</categoryTypeId>
> >>>         <longDescription>Individuals</longDescription>
> >>>       </category>
> >>>       <subTopic subTopicId="62" topicId="3">
> >>>         <displayTitle>Finland</displayTitle>
> >>>         <description>Finland</description>
> >>>         <sortOrder>-1</sortOrder>
> >>>       </subTopic>
> >>>       <topic topicId="3">
> >>>         <description>Europe</description>
> >>>       </topic>
> >>>     </entry>
> >>> </content>
> >>>
> >>> The elements that are included in the search are title, 
> sortTitle, 
> >>> mainText, and siteCredits.
> >>>
> >>> For the MarkLogic index settings, I have selected only
> >> basic stemmed
> >>> searches and fast phrase searches.
> >>>
> >>> The best results so far have been obtained when the entry
> >> element has
> >>> been added as a fragment root.
> >>>
> >>> Here's the code currently being used to execute the search:
> >>>
> >>>     cts:search(fn:doc()//content/entry,
> >>> cts:field-word-query("FullText", "president"), "unfiltered" )
> >>>
> >>> where "FullText" is a field that has been set up with the four 
> >>> searchable elements above.
> >>>
> >>> I tried running with xdmp:query-meters() and didn't find 
> any cache 
> >>> misses.
> >>>
> >>> I'm experienced with SQL Server, but brand new to 
> MarkLogic, so any 
> >>> suggestions would be appreciated.
> >>>
> >>> -Grant
> >>>
> >>>
> >>>
> >>>
> >>>
> >> 
> ---------------------------------------------------------------------
> >> -
> >>> --
> >>>
> >>> _______________________________________________
> >>> General mailing list
> >>> [email protected]
> >>> http://xqzone.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >>
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> 
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to