Sweet. That worked. I re-created the XML with categoryId as an element
underneath <category> instead of as an attribute.
Then used the paginated search to retrieve the records to display on the
page and the following to set up the filtered (faceted) category search:
cts:element-values(xs:QName('category'), (), (), $searchQuery)
This last search returned all of the child elements under <category>,
which is all I needed in order to set up the display of the page. Pretty
darn simple once I knew what to do.
MarkLogic went from taking twice as long as SQL Server to being 3-4
times faster.
Thanks for your help, Mike! Much appreciated.
-Grant
> -----Original Message-----
> Michael Blakeley
> Sent: Tuesday, December 30, 2008 10:20 AM
>
> Grant,
>
> Your query-trace output looks normal to me. I didn't expect
> to see anything unusual there, but it's a good tool to know about.
>
> Your category display sounds similar to a feature called
> facets, or guided navigation. If so, you can avoid retrieving
> all the results. Try
> cts:element-values() or cts:element-attribute-values() with
> the original user query, and a range index on the appropriate
> element or element-attribute pair.
>
> For example, I think your categories could be represented
> with an attribute range index on category/@categoryId as
> integer. In that case the heart of the query might be
> something like this:
>
> let $query := cts:field-word-query("FullText", "president")
> let $page := (
> cts:search(/entry, $query, 'unfiltered')
> )[1 to 10]
> let $facets :=
> for $v in cts:element-attribute-values(
> xs:QName('category'), xs:QName('categoryId'),
> (), ('frequency-order', 'type=integer'),
> $query)
> return element category {
> attribute frequency { cts:frequency($v) },
> $v
> }
> return element results {
> attribute remainder {
> if ($page[1]) then cts:remainder($page[1]) else 0 },
> element query { $query },
> $page,
> $facets
> }
>
> You could also add a lookup step to get the longDescription
> for each value, possibly using map:map(). Or that might be
> better handled in your display layer.
>
> thanks,
> -- Mike
>
> On 2008-12-30 08:59, Grant Lindley wrote:
> > Thanks for your suggestions, Mike. See below.
> >
> >> I strongly recommend pagination in your query: see
> >>
> >
> http://developer.marklogic.com/howto/tutorials/2006-09-paginat
> ed-search.
> > xqy
> >
> > This greatly increases the performance, but there is a hitch. In my
> > case, there is a special requirement for the search results
> page that
> > all of the categories that have at least one matching
> record are to be
> > displayed. (Categories are things like map, image, biography, etc.)
> >
> > I think this means that I have to loop through all matching
> records in
> > order to grab all of the matched categories... unless there
> is a way
> > to craft a fast search that only pulls out the categories. Then I
> > could combine the fast category search with the fast
> paginated search.
> > I'll explore that some more.
> >
> >> As well as xdmp:query-meters(), you should consult
> >> xdmp:query-trace() - see
> >> http://developer.marklogic.com/pubs/4.0/books/performance.pdf
> >
> > Here's the output from query-meters() and query-trace(). I
> didn't see
> > anything, except I'm not sure what the value of
> the<qm:elapsed-time>
> > element means. (The search took approximately 5 seconds to return.)
> >
> > /eval line 1: Analyzing path for search: doc() /eval line
> 1: Step 1 is
> > searchable: doc() /eval line 1: Path is fully searchable.
> > /eval line 1: Gathering constraints.
> > /eval line 1: Search query contributed 1 constraint:
> > cts:field-word-query("FullText", "president", ("lang=en"), 1) /eval
> > line 1: Executing search.
> > /eval line 1: Selected 4090 fragments
> > <qm:query-meters
> > xsi:schemaLocation="http://marklogic.com/xdmp/query-meters
> > query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters"
> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
> > <qm:elapsed-time>PT0S</qm:elapsed-time>
> > <qm:requests>1</qm:requests>
> > <qm:list-cache-hits>4</qm:list-cache-hits>
> > <qm:list-cache-misses>0</qm:list-cache-misses>
> > <qm:in-memory-list-hits>0</qm:in-memory-list-hits>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > <qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
> > <qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
> >
> <qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hit
> > s> <qm:value-cache-hits>0</qm:value-cache-hits>
> > <qm:value-cache-misses>0</qm:value-cache-misses>
> > <qm:regexp-cache-hits>0</qm:regexp-cache-hits>
> > <qm:regexp-cache-misses>0</qm:regexp-cache-misses>
> > <qm:link-cache-hits>0</qm:link-cache-hits>
> > <qm:link-cache-misses>0</qm:link-cache-misses>
> > <qm:fragments-added>0</qm:fragments-added>
> > <qm:fragments-deleted>0</qm:fragments-deleted>
> > <qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
> > <qm:fs-program-cache-misses>0</qm:fs-program-cache-misses>
> > <qm:db-program-cache-hits>0</qm:db-program-cache-hits>
> > <qm:db-program-cache-misses>0</qm:db-program-cache-misses>
> >
> <qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-c
> > ac
> > he-hits>
> >
> <qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence
> > -c
> > ache-misses>
> >
> <qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-c
> > ac
> > he-hits>
> >
> <qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence
> > -c
> > ache-misses>
> > <qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
> >
> <qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misse
> > s>
> > <qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
> >
> <qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misse
> > s>
> > <qm:fragments>
> > <qm:fragment>
> > <qm:root xmlns="">entry</qm:root>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > </qm:fragment>
> > </qm:fragments>
> > <qm:documents>
> > <qm:document>
> > <qm:uri>/C/TEMP/EBookDump/436672.xml</qm:uri>
> > <qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
> > <qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
> > </qm:document>
> > </qm:documents>
> >
> >> Looking at the structure of your documents, I'd try storing each
> >> entry as a separate document. So your search would become /entry
> >> rather than/content/entry.
> >
> > I removed the content element and created and loaded a separate
> > document for each record. This didn't change the
> performance, however.
> >
> >
> >> On 2008-12-29 11:58, Grant Lindley wrote:
> >>> I'm comparing full-text search performance between
> >> MarkLogic 4.0 and
> >>> SQL
> >>> Server2005 from a C# .NET web page.
> >>>
> >>> So far searches take about twice as long in MarkLogic
> >> compared to SQL
> >>> Server, and I'm looking for suggestions to improve
> >> performance in ML.
> >>> The test data consists of 14,035 searchable records that
> >> take up 52 MB
> >>> in an XML text file.
> >>>
> >>> Here's a sample record:
> >>>
> >>> <content>
> >>> <entry entryId="121866">
> >>> <title>Alvar Aalto</title>
> >>> <sortTitle>Aalto, Alvar</sortTitle>
> >>> <searchTitle>Aalto, Alvar</searchTitle>
> >>> <synopsis>Finland's most distinguished designer, Alvar
> >> Aalto is
> >>> renowned for his building designs as well as for his unique
> >> birchwood
> >>> furniture designs that are the archetype of Finnish furniture.
> >>> </synopsis>
> >>> <mainText> Finland's most distinguished architect and
> >> designer, ...
> >>> [long text removed]</mainText>
> >>> <entryDate></entryDate>
> >>> <searchExclude>False</searchExclude>
> >>> <hyperlink>False</hyperlink>
> >>> <furtherReading>Alvar Aalto Museum Web Site
> >>> (http://www.alvaraalto.fi)</furtherReading>
> >>> <siteCredits>ABC-CLIO</siteCredits>
> >>> <citationCredits></citationCredits>
> >>> <citationCredits2></citationCredits2>
> >>> <accentUpdated>True</accentUpdated>
> >>> <category categoryId="22">
> >>> <displayTitle>Individuals</displayTitle>
> >>> <formOrder>30</formOrder>
> >>> <filterable>True</filterable>
> >>> <categoryTypeId>5</categoryTypeId>
> >>> <longDescription>Individuals</longDescription>
> >>> </category>
> >>> <subTopic subTopicId="62" topicId="3">
> >>> <displayTitle>Finland</displayTitle>
> >>> <description>Finland</description>
> >>> <sortOrder>-1</sortOrder>
> >>> </subTopic>
> >>> <topic topicId="3">
> >>> <description>Europe</description>
> >>> </topic>
> >>> </entry>
> >>> </content>
> >>>
> >>> The elements that are included in the search are title,
> sortTitle,
> >>> mainText, and siteCredits.
> >>>
> >>> For the MarkLogic index settings, I have selected only
> >> basic stemmed
> >>> searches and fast phrase searches.
> >>>
> >>> The best results so far have been obtained when the entry
> >> element has
> >>> been added as a fragment root.
> >>>
> >>> Here's the code currently being used to execute the search:
> >>>
> >>> cts:search(fn:doc()//content/entry,
> >>> cts:field-word-query("FullText", "president"), "unfiltered" )
> >>>
> >>> where "FullText" is a field that has been set up with the four
> >>> searchable elements above.
> >>>
> >>> I tried running with xdmp:query-meters() and didn't find
> any cache
> >>> misses.
> >>>
> >>> I'm experienced with SQL Server, but brand new to
> MarkLogic, so any
> >>> suggestions would be appreciated.
> >>>
> >>> -Grant
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> ---------------------------------------------------------------------
> >> -
> >>> --
> >>>
> >>> _______________________________________________
> >>> General mailing list
> >>> [email protected]
> >>> http://xqzone.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://xqzone.com/mailman/listinfo/general
> >>
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general