Nick - you can also create range indexes explicitly in MarkLogic, and
these will really help with the performance of joins, just as they do in
eXist.
-Mike
On 03/16/2012 08:26 AM, Nick Tuckett wrote:
I'm evaluating MarkLogic as a possible way to store and access around
25Mb (and growing) of fairly complex XML data. For one particular type
of common query for my application, I'm seeing drastically different
performance between MarkLogic and eXist. I would be very grateful for
any feedback or advice on how to improve this performance
One common feature of this data are attributes containing identifying
values that reference other elements in the collection - an example of
this is for referencing localised text from a common XML file. I have
been using a fairly simple query to benchmark performance that looks
like this:
for $x in
collection('/db/content')//(elementa|elementb|elementc|elementd|elemente|elementf)
let $e := doc('/db/language/lang_en.xml')//text[@id=$x/@localisedtextid]
order by $x/@id
return
<localisedtext quest="{$e}"/>
The benchmark content has around 2500 instances for this particular
case. With everything else constant (hardware, OS, content) I see
drastically different performance between MarkLogic and eXist. The
former takes around 59 seconds to return the data for all instances,
the latter takes 8 seconds.
As I understand it, MarkLogic sets up indexing automatically,
including indexing on element-attribute pairs. To match this, I
created an explicit equivalent index for eXist for the text/@id pair
for use in this case.
For MarkLogic, running the query with the profiler showed that around
75% of the execution time went on '@id = $x/@localisedtextid', and
query tracing produced the following output:
Initial part of query:
2012-03-16 11:35:35.743 Info: App-Services: at 2:10:
xdmp:eval("xdmp:query-trace(true()), for $x in
collection('/db/content...", (), <options
xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Analyzing path
for $x:
fn:collection("/db/content")/descendant-or-self::node()/(elementa|elementb|elementc|elementd|elemente|elementf)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1 is
searchable: fn:collection("/db/content")
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 2 does not
use indexes: descendant-or-self::node()
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 3 is
searchable: (elementa|elementb|elementc|elementd|elemente|elementf)
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Path is fully
searchable.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Gathering
constraints.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Step 1
contributed 1 constraint: fn:collection("/db/ content")
2012-03-16 11:35:35.743 Info: App-Services: at 2:48: Step 3
contributed 1 constraint: elementa
2012-03-16 11:35:35.743 Info: App-Services: at 2:62: Step 3
contributed 1 constraint: elementb
2012-03-16 11:35:35.743 Info: App-Services: at 2:81: Step 3
contributed 1 constraint: elementc
2012-03-16 11:35:35.743 Info: App-Services: at 2:110: Step 3
contributed 1 constraint: elementd
2012-03-16 11:35:35.743 Info: App-Services: at 2:122: Step 3
contributed 1 constraint: elemente
2012-03-16 11:35:35.743 Info: App-Services: at 2:132: Step 3
contributed 1 constraint: elementf
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 2:10: Selected 8
fragments to filter.
Iterated part of query (repeat N times...)
2012-03-16 11:35:35.743 Info: App-Services: at 3:48:
xdmp:eval("xdmp:query-trace(true()), for $x in
collection('/db/content...", (), <options
xmlns="xdmp:eval"><database>1488253557778688591</database><modules>148825355777868...</options>)
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Analyzing path:
fn:doc("/db/language/lang_en.xml")/descendant::text[@id =
xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 1 is
searchable: fn:doc("/db/language/lang_en.xml")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 is
searchable: descendant::text[@id = xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Path is fully
searchable.
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Gathering
constraints.
2012-03-16 11:35:35.743 Info: App-Services: at 3:10: Step 1
contributed 1 constraint: fn:doc("/db/language/lang_en.xml")
2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison
contributed hash value constraint: text/@id =
xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate
1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:54: Comparison
contributed hash value constraint: text/@id =
xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2 predicate
1 contributed 1 constraint: @id = xs:untypedAtomic("ElementName143001")
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Step 2
contributed 2 constraints: descendant::text[@id =
xs:untypedAtomic("ElementName143001")]
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Executing search.
2012-03-16 11:35:35.743 Info: App-Services: at 3:48: Selected 1
fragment to filter
Query meters:
<qm:query-meters
xsi:schemaLocation="http://marklogic.com/xdmp/query-meters
query-meters.xsd" xmlns:qm="http://marklogic.com/xdmp/query-meters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<qm:elapsed-time>PT56.655659S</qm:elapsed-time>
<qm:requests>0</qm:requests>
<qm:list-cache-hits>1043</qm:list-cache-hits>
<qm:list-cache-misses>0</qm:list-cache-misses>
<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
<qm:expanded-tree-cache-hits>519</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
<qm:compressed-tree-cache-hits>0</qm:compressed-tree-cache-hits>
<qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
<qm:value-cache-hits>6672643</qm:value-cache-hits>
<qm:value-cache-misses>6673683</qm:value-cache-misses>
<qm:regexp-cache-hits>0</qm:regexp-cache-hits>
<qm:regexp-cache-misses>0</qm:regexp-cache-misses>
<qm:link-cache-hits>0</qm:link-cache-hits>
<qm:link-cache-misses>0</qm:link-cache-misses>
<qm:filter-hits>0</qm:filter-hits>
<qm:filter-misses>0</qm:filter-misses>
<qm:fragments-added>0</qm:fragments-added>
<qm:fragments-deleted>0</qm:fragments-deleted>
<qm:fs-program-cache-hits>0</qm:fs-program-cache-hits>
<qm:fs-program-cache-misses>1</qm:fs-program-cache-misses>
<qm:db-program-cache-hits>0</qm:db-program-cache-hits>
<qm:db-program-cache-misses>0</qm:db-program-cache-misses>
<qm:env-program-cache-hits>0</qm:env-program-cache-hits>
<qm:env-program-cache-misses>0</qm:env-program-cache-misses>
<qm:fs-main-module-sequence-cache-hits>0</qm:fs-main-module-sequence-cache-hits>
<qm:fs-main-module-sequence-cache-misses>0</qm:fs-main-module-sequence-cache-misses>
<qm:db-main-module-sequence-cache-hits>0</qm:db-main-module-sequence-cache-hits>
<qm:db-main-module-sequence-cache-misses>0</qm:db-main-module-sequence-cache-misses>
<qm:fs-library-module-cache-hits>0</qm:fs-library-module-cache-hits>
<qm:fs-library-module-cache-misses>0</qm:fs-library-module-cache-misses>
<qm:db-library-module-cache-hits>0</qm:db-library-module-cache-hits>
<qm:db-library-module-cache-misses>0</qm:db-library-module-cache-misses>
<qm:fragments>
<qm:fragment>
<qm:root>contents</qm:root>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
<qm:fragment>
<qm:root>database</qm:root>
<qm:expanded-tree-cache-hits>8</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:fragment>
</qm:fragments>
<qm:documents>
<qm:document>
<qm:uri>/db/content/file1.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file2.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/language/lang_en.xml</qm:uri>
<qm:expanded-tree-cache-hits>511</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file3.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file4.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file5.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file6.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file7.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
<qm:document>
<qm:uri>/db/content/file8.xml</qm:uri>
<qm:expanded-tree-cache-hits>1</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>0</qm:expanded-tree-cache-misses>
</qm:document>
</qm:documents>
<qm:hosts/>
</qm:query-meters>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general