Nick - you can also create range indexes explicitly in MarkLogic, and these will really help with the performance of joins, just as they do in eXist.

-Mike

On 03/16/2012 08:26 AM, Nick Tuckett wrote:
I'm evaluating MarkLogic as a possible way to store and access around 25Mb (and growing) of fairly complex XML data. For one particular type of common query for my application, I'm seeing drastically different performance between MarkLogic and eXist. I would be very grateful for any feedback or advice on how to improve this performance

One common feature of this data are attributes containing identifying values that reference other elements in the collection - an example of this is for referencing localised text from a common XML file. I have been using a fairly simple query to benchmark performance that looks like this:

for $x in collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
let $e := doc('/db/language/lang_en.xml')//text[@id=$x/@localisedtextid]
order by $x/@id
return
<localisedtext quest="{$e}"/>

The benchmark content has around 2500 instances for this particular case. With everything else constant (hardware, OS, content) I see drastically different performance between MarkLogic and eXist. The former takes around 59 seconds to return the data for all instances, the latter takes 8 seconds.

As I understand it, MarkLogic sets up indexing automatically, including indexing on element-attribute pairs. To match this, I created an explicit equivalent index for eXist for the text/@id pair for use in this case.

For MarkLogic, running the query with the profiler showed that around 75% of the execution time went on '@id = $x/@localisedtextid', and query tracing produced the following output:

Initial part of query:

2012-03-16 11:35:35.743 Info: App-Services: at 2:10: xdmp:eval("xdmp:query-trace(true()),&#10;for $x in collection('/db/content...", (), <options xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>) 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Analyzing path for $x: fn:collection("/db/content")/descendant-or-self::node()/(elementa|elementb|elementc|elementd|elemente|elementf) 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 is searchable: fn:collection("/db/content") 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 2 does not use indexes: descendant-or-self::node() 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 3 is searchable: (elementa|elementb|elementc|elementd|elemente|elementf) 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Path is fully searchable. 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Gathering constraints. 2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 contributed 1 constraint: fn:collection("/db/ content") 2012-03-16 11:35:35.743 Info: App-Services: at 2:48: Step 3 contributed 1 constraint: elementa 2012-03-16 11:35:35.743 Info: App-Services: at 2:62: Step 3 contributed 1 constraint: elementb 2012-03-16 11:35:35.743 Info: App-Services: at 2:81: Step 3 contributed 1 constraint: elementc 2012-03-16 11:35:35.743 Info: App-Services: at 2:110: Step 3 contributed 1 constraint: elementd 2012-03-16 11:35:35.743 Info: App-Services: at 2:122: Step 3 contributed 1 constraint: elemente 2012-03-16 11:35:35.743 Info: App-Services: at 2:132: Step 3 contributed 1 constraint: elementf
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Selected 8 fragments to filter.

Iterated part of query (repeat N times...)

2012-03-16 11:35:35.743 Info: App-Services: at 3:48: xdmp:eval("xdmp:query-trace(true()),&#10;for $x in collection('/db/content...", (), <options xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>) 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Analyzing path: fn:doc("/db/language/lang_en.xml")/descendant::text[@id = xs:untypedAtomic("ElementName143001")] 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 1 is searchable: fn:doc("/db/language/lang_en.xml") 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 is searchable: descendant::text[@id = xs:untypedAtomic("ElementName143001")] 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Path is fully searchable. 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Gathering constraints. 2012-03-16 11:35:35.743 Info: App-Services: at 3:10: Step 1 contributed 1 constraint: fn:doc("/db/language/lang_en.xml") 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed hash value constraint: text/@id = xs:untypedAtomic("ElementName143001") 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001") 2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison contributed hash value constraint: text/@id = xs:untypedAtomic("ElementName143001") 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate 1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001") 2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 contributed 2 constraints: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Selected 1 fragment to filter

Query meters:

<qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
<qm:elapsed-time>PT56.655659S</qm:elapsed-time>
<qm:requests>0</qm:requests>
<qm:list-cache-hits>1043</qm:list-cache-hits>
<qm:list-cache-misses>0</qm:list-cache-misses>
<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
<qm:expanded-tree-cache-hits>519</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
<qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
<qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
<qm:value-cache-hits>6672643</qm:value-cache-hits>
<qm:value-cache-misses>6673683</qm:value-cache-misses>
<qm:regexp-cache-hits>0</qm:regexp-cache-hits>
<qm:regexp-cache-misses>0</qm:regexp-cache-misses>
<qm:link-cache-hits>0</qm:link-cache-hits>
<qm:link-cache-misses>0</qm:link-cache-misses>
<qm:filter-hits>0</qm:filter-hits>
<qm:filter-misses>0</qm:filter-misses>
<qm:fragments-added>0</qm:fragments-added>
<qm:fragments-deleted>0</qm:fragments-deleted>
<qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
<qm:fs-program-cache-misses>1</qm:fs-program-cache-misses>
<qm:db-program-cache-hits>0</qm:db-program-cache-hits>
<qm:db-program-cache-misses>0</qm:db-program-cache-misses>
<qm:env-program-cache-hits>0</qm:env-program-cache-hits>
<qm:env-program-cache-misses>0</qm:env-program-cache-misses>
<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>
<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>
<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>
<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
<qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
<qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
<qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
<qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
<qm:fragments>
<qm:fragment>
<qm:root>contents</qm:root>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
<qm:fragment>
<qm:root>database</qm:root>
<qm:expanded-tree-cache-hits>8</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
</qm:fragments>
<qm:documents>
<qm:document>
<qm:uri>/db/content/file1.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file2.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/language/lang_en.xml</qm:uri>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file3.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file4.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file5.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file6.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file7.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file8.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
</qm:documents>
<qm:hosts/>
</qm:query-meters>


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to