Grant,

Your query-trace output looks normal to me. I didn't expect to see anything unusual there, but it's a good tool to know about.

Your category display sounds similar to a feature called facets, or guided navigation. If so, you can avoid retrieving all the results. Try cts:element-values() or cts:element-attribute-values() with the original user query, and a range index on the appropriate element or element-attribute pair.

For example, I think your categories could be represented with an attribute range index on category/@categoryId as integer. In that case the heart of the query might be something like this:

let $query := cts:field-word-query("FullText", "president")
let $page := (
  cts:search(/entry, $query, 'unfiltered')
)[1 to 10]
let $facets :=
  for $v in cts:element-attribute-values(
    xs:QName('category'), xs:QName('categoryId'),
    (), ('frequency-order', 'type=integer'),
    $query)
  return element category {
    attribute frequency { cts:frequency($v) },
    $v
  }
return element results {
  attribute remainder {
    if ($page[1]) then cts:remainder($page[1]) else 0 },
  element query { $query },
  $page,
  $facets
}

You could also add a lookup step to get the longDescription for each value, possibly using map:map(). Or that might be better handled in your display layer.

thanks,
-- Mike

On 2008-12-30 08:59, Grant Lindley wrote:
Thanks for your suggestions, Mike. See below.

I strongly recommend pagination in your query: see

http://developer.marklogic.com/howto/tutorials/2006-09-paginated-search.
xqy

This greatly increases the performance, but there is a hitch. In my
case, there is a special requirement for the search results page that
all of the categories that have at least one matching record are to be
displayed. (Categories are things like map, image, biography, etc.)

I think this means that I have to loop through all matching records in
order to grab all of the matched categories... unless there is a way to
craft a fast search that only pulls out the categories. Then I could
combine the fast category search with the fast paginated search. I'll
explore that some more.

As well as xdmp:query-meters(), you should consult
xdmp:query-trace() - see
http://developer.marklogic.com/pubs/4.0/books/performance.pdf

Here's the output from query-meters() and query-trace(). I didn't see
anything, except I'm not sure what the value of the<qm:elapsed-time>
element means. (The search took approximately 5 seconds to return.)

/eval line 1: Analyzing path for search: doc()
/eval line 1: Step 1 is searchable: doc()
/eval line 1: Path is fully searchable.
/eval line 1: Gathering constraints.
/eval line 1: Search query contributed 1 constraint:
cts:field-word-query("FullText", "president", ("lang=en"), 1)
/eval line 1: Executing search.
/eval line 1: Selected 4090 fragments
<qm:query-meters
xsi:schemaLocation="http://marklogic.com/xdmp/query-meters
query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
<qm:elapsed-time>PT0S</qm:elapsed-time>
<qm:requests>1</qm:requests>
<qm:list-cache-hits>4</qm:list-cache-hits>
<qm:list-cache-misses>0</qm:list-cache-misses>
<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
<qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
<qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
<qm:value-cache-hits>0</qm:value-cache-hits>
<qm:value-cache-misses>0</qm:value-cache-misses>
<qm:regexp-cache-hits>0</qm:regexp-cache-hits>
<qm:regexp-cache-misses>0</qm:regexp-cache-misses>
<qm:link-cache-hits>0</qm:link-cache-hits>
<qm:link-cache-misses>0</qm:link-cache-misses>
<qm:fragments-added>0</qm:fragments-added>
<qm:fragments-deleted>0</qm:fragments-deleted>
<qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
<qm:fs-program-cache-misses>0</qm:fs-program-cache-misses>
<qm:db-program-cache-hits>0</qm:db-program-cache-hits>
<qm:db-program-cache-misses>0</qm:db-program-cache-misses>
<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cac
he-hits>
<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-c
ache-misses>
<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cac
he-hits>
<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-c
ache-misses>
<qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
<qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
<qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
<qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
<qm:fragments>
<qm:fragment>
<qm:root xmlns="">entry</qm:root>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
</qm:fragments>
<qm:documents>
<qm:document>
<qm:uri>/C/TEMP/EBookDump/436672.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
</qm:documents>

Looking at the structure of your documents, I'd try storing
each entry as a separate document. So your search would
become /entry rather than/content/entry.

I removed the content element and created and loaded a separate document
for each record. This didn't change the performance, however.


On 2008-12-29 11:58, Grant Lindley wrote:
I'm comparing full-text search performance between
MarkLogic 4.0 and
SQL
Server2005 from a C# .NET web page.

So far searches take about twice as long in MarkLogic
compared to SQL
Server, and I'm looking for suggestions to improve
performance in ML.
The test data consists of 14,035 searchable records that
take up 52 MB
in an XML text file.

Here's a sample record:

<content>
    <entry entryId="121866">
      <title>Alvar Aalto</title>
      <sortTitle>Aalto, Alvar</sortTitle>
      <searchTitle>Aalto, Alvar</searchTitle>
      <synopsis>Finland's most distinguished designer, Alvar
Aalto is
renowned for his building designs as well as for his unique
birchwood
furniture designs that are the archetype of Finnish furniture.
</synopsis>
      <mainText>   Finland's most distinguished architect and
designer, ...
[long text removed]</mainText>
      <entryDate></entryDate>
      <searchExclude>False</searchExclude>
      <hyperlink>False</hyperlink>
      <furtherReading>Alvar Aalto Museum Web Site
(http://www.alvaraalto.fi)</furtherReading>
      <siteCredits>ABC-CLIO</siteCredits>
      <citationCredits></citationCredits>
      <citationCredits2></citationCredits2>
      <accentUpdated>True</accentUpdated>
      <category categoryId="22">
        <displayTitle>Individuals</displayTitle>
        <formOrder>30</formOrder>
        <filterable>True</filterable>
        <categoryTypeId>5</categoryTypeId>
        <longDescription>Individuals</longDescription>
      </category>
      <subTopic subTopicId="62" topicId="3">
        <displayTitle>Finland</displayTitle>
        <description>Finland</description>
        <sortOrder>-1</sortOrder>
      </subTopic>
      <topic topicId="3">
        <description>Europe</description>
      </topic>
    </entry>
</content>

The elements that are included in the search are title, sortTitle,
mainText, and siteCredits.

For the MarkLogic index settings, I have selected only
basic stemmed
searches and fast phrase searches.

The best results so far have been obtained when the entry
element has
been added as a fragment root.

Here's the code currently being used to execute the search:

    cts:search(fn:doc()//content/entry,
cts:field-word-query("FullText", "president"), "unfiltered" )

where "FullText" is a field that has been set up with the four
searchable elements above.

I tried running with xdmp:query-meters() and didn't find any cache
misses.

I'm experienced with SQL Server, but brand new to MarkLogic, so any
suggestions would be appreciated.

-Grant





----------------------------------------------------------------------
--

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to