RE: [MarkLogic Dev General] MarkLogic vs SQL Server search performance

Grant Lindley Tue, 30 Dec 2008 08:59:14 -0800

Thanks for your suggestions, Mike. See below.

> I strongly recommend pagination in your query: see 
>
http://developer.marklogic.com/howto/tutorials/2006-09-paginated-search.
xqy

This greatly increases the performance, but there is a hitch. In my
case, there is a special requirement for the search results page that
all of the categories that have at least one matching record are to be
displayed. (Categories are things like map, image, biography, etc.)

I think this means that I have to loop through all matching records in
order to grab all of the matched categories... unless there is a way to
craft a fast search that only pulls out the categories. Then I could
combine the fast category search with the fast paginated search. I'll
explore that some more.

> As well as xdmp:query-meters(), you should consult 
> xdmp:query-trace() - see 
> http://developer.marklogic.com/pubs/4.0/books/performance.pdf

Here's the output from query-meters() and query-trace(). I didn't see
anything, except I'm not sure what the value of the <qm:elapsed-time>
element means. (The search took approximately 5 seconds to return.)

/eval line 1: Analyzing path for search: doc()
/eval line 1: Step 1 is searchable: doc()
/eval line 1: Path is fully searchable.
/eval line 1: Gathering constraints.
/eval line 1: Search query contributed 1 constraint:
cts:field-word-query("FullText", "president", ("lang=en"), 1)
/eval line 1: Executing search.
/eval line 1: Selected 4090 fragments
<qm:query-meters
xsi:schemaLocation="http://marklogic.com/xdmp/query-meters
query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
<qm:elapsed-time>PT0S</qm:elapsed-time>
<qm:requests>1</qm:requests>
<qm:list-cache-hits>4</qm:list-cache-hits>
<qm:list-cache-misses>0</qm:list-cache-misses>
<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
<qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
<qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
<qm:value-cache-hits>0</qm:value-cache-hits>
<qm:value-cache-misses>0</qm:value-cache-misses>
<qm:regexp-cache-hits>0</qm:regexp-cache-hits>
<qm:regexp-cache-misses>0</qm:regexp-cache-misses>
<qm:link-cache-hits>0</qm:link-cache-hits>
<qm:link-cache-misses>0</qm:link-cache-misses>
<qm:fragments-added>0</qm:fragments-added>
<qm:fragments-deleted>0</qm:fragments-deleted>
<qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
<qm:fs-program-cache-misses>0</qm:fs-program-cache-misses>
<qm:db-program-cache-hits>0</qm:db-program-cache-hits>
<qm:db-program-cache-misses>0</qm:db-program-cache-misses>
<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cac
he-hits>
<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-c
ache-misses>
<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cac
he-hits>
<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-c
ache-misses>
<qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
<qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
<qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
<qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
<qm:fragments>
<qm:fragment>
<qm:root xmlns="">entry</qm:root>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
</qm:fragments>
<qm:documents>
<qm:document>
<qm:uri>/C/TEMP/EBookDump/436672.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
</qm:documents>

> Looking at the structure of your documents, I'd try storing 
> each entry as a separate document. So your search would 
> become /entry rather than/content/entry.

I removed the content element and created and loaded a separate document
for each record. This didn't change the performance, however.

> 
> On 2008-12-29 11:58, Grant Lindley wrote:
> > I'm comparing full-text search performance between 
> MarkLogic 4.0 and 
> > SQL
> > Server2005 from a C# .NET web page.
> >
> > So far searches take about twice as long in MarkLogic 
> compared to SQL 
> > Server, and I'm looking for suggestions to improve 
> performance in ML.
> >
> > The test data consists of 14,035 searchable records that 
> take up 52 MB 
> > in an XML text file.
> >
> > Here's a sample record:
> >
> > <content>
> >    <entry entryId="121866">
> >      <title>Alvar Aalto</title>
> >      <sortTitle>Aalto, Alvar</sortTitle>
> >      <searchTitle>Aalto, Alvar</searchTitle>
> >      <synopsis>Finland's most distinguished designer, Alvar 
> Aalto is 
> > renowned for his building designs as well as for his unique 
> birchwood 
> > furniture designs that are the archetype of Finnish furniture.
> > </synopsis>
> >      <mainText>  Finland's most distinguished architect and 
> designer, ...
> > [long text removed]</mainText>
> >      <entryDate></entryDate>
> >      <searchExclude>False</searchExclude>
> >      <hyperlink>False</hyperlink>
> >      <furtherReading>Alvar Aalto Museum Web Site 
> > (http://www.alvaraalto.fi)</furtherReading>
> >      <siteCredits>ABC-CLIO</siteCredits>
> >      <citationCredits></citationCredits>
> >      <citationCredits2></citationCredits2>
> >      <accentUpdated>True</accentUpdated>
> >      <category categoryId="22">
> >        <displayTitle>Individuals</displayTitle>
> >        <formOrder>30</formOrder>
> >        <filterable>True</filterable>
> >        <categoryTypeId>5</categoryTypeId>
> >        <longDescription>Individuals</longDescription>
> >      </category>
> >      <subTopic subTopicId="62" topicId="3">
> >        <displayTitle>Finland</displayTitle>
> >        <description>Finland</description>
> >        <sortOrder>-1</sortOrder>
> >      </subTopic>
> >      <topic topicId="3">
> >        <description>Europe</description>
> >      </topic>
> >    </entry>
> > </content>
> >
> > The elements that are included in the search are title, sortTitle, 
> > mainText, and siteCredits.
> >
> > For the MarkLogic index settings, I have selected only 
> basic stemmed 
> > searches and fast phrase searches.
> >
> > The best results so far have been obtained when the entry 
> element has 
> > been added as a fragment root.
> >
> > Here's the code currently being used to execute the search:
> >
> >    cts:search(fn:doc()//content/entry, 
> > cts:field-word-query("FullText", "president"), "unfiltered" )
> >
> > where "FullText" is a field that has been set up with the four 
> > searchable elements above.
> >
> > I tried running with xdmp:query-meters() and didn't find any cache 
> > misses.
> >
> > I'm experienced with SQL Server, but brand new to MarkLogic, so any 
> > suggestions would be appreciated.
> >
> > -Grant
> >
> >
> >
> >
> > 
> ----------------------------------------------------------------------
> > --
> >
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://xqzone.com/mailman/listinfo/general
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> 
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] MarkLogic vs SQL Server search performance

Reply via email to