Susan,
I may not understand your use-case and document structure correctly, but
doesn't this query have to access every available taxonomy document? I
say this because it lists out the isbn13, booktitle, and authorsort
elements, but only authorsort comes from a range index. The query as
written is actually a little worse than that, because it calls
cts:search once per author. If some documents have multiple authors, it
will access each document multiple times.
In the end, this query probably calls cts:search at least 1000 times.
That would account for the bulk of the elapsed time. On average that's
less than 6-ms per cts:search, which really isn't bad for what it's
doing. I think we can improve on that a little bit, but to get a
dramatic speed-up we need to re-think the problem.
How fast does this query need to run? Could you run this query whenever
the relevant documents are updated, and store the results in a new
document? If so, 5.5-sec might be acceptable because it would only
happen once, at the end of each batch of updates. Any user access to the
stored version of the author-list would be much faster, since all the
hard work has already been done.
That would be my preferred solution. But while you are thinking that
over, let's optimize the query a bit. In this case I don't think more
range indexes will help: there would still be around a thousand trips
through the database to build the page. Instead we can eliminate the
extra document reads by making a single pass over all the taxonomy docs.
We'll use a map to accumulate results in memory.
let $map := map:map()
let $build :=
for $t in collection('abce')/taxonomy
for $name as xs:string in $t/Authors/author/authorsort
return map:put(
$map, $name,
(map:get($map, $name),
element title { $t/isbn13, $t/booktitle } ) )
for $v in map:keys($map)
order by $v
return element heading {
attribute type { 'author'},
element name { $v },
for $t in map:get($map, $v)
return element title {
attribute doc-id {$t/isbn13},
$t/booktitle } }
That may look more complicated, but I think you'll find that it's faster
- mostly because it's guaranteed to only touch each document once. I
faked a test using some Medline documents, and this was about 5x faster
than the baseline. It also doesn't require any range indexes.
Possibly it still isn't fast enough, and you've decided that caching the
output in a new document won't work. If so, then I think you'll need to
use co-occurrences. This gets tricky because you want three pieces of
information, but co-occurrence only supports two elements at a time. But
I think you can still manage it if you're willing to create a new
element that combines title and isbn13 in a delimited string.
Then you could build a range indexes on your new isbn13-title element
(let's say it's comma-delimited) and use
cts:element-value-co-occurrences() to build the entire result set
without any document fetches.
-- Mike
On 2009-07-17 07:02, Susan Basch wrote:
Hi all,
Apologies in advance for the length of this email . . .
I'm trying to generate a list unique authors with their associated titles from
a taxonomy element that's added to each of our titles before it's loaded into
Mark Logic.
The taxonomy looks something like this:
<taxonomy>
<booktitle>Daily Lives of Civilians in Wartime Twentieth-Century
Europe</booktitle>
<booktitle_sort>Daily Lives of Civilians in Wartime Twentieth-Century
Europe</booktitle_sort>
...
<Authors>
<author authorId="130670">
<authorsort>Atkin, Nicholas</authorsort>
<firstname>Nicholas</firstname>
<middlename /><lastname>Atkin</lastname>
<role>Author</role><rank>1</rank>
</author>
</Authors>
</taxonomy>
There's an element-range-index on the authorsort and booktitle_sort elements.
There can be more than one author element.
And the query (so far) looks something like this:
for $v in cts:element-values(
xs:QName('authorsort'),
(), (),
cts:collection-query('abce'))
return
element heading {
attribute type { 'author'},
element name {$v},
let $titles :=
( cts:search(collection("abce")//taxonomy,
cts:element-value-query(xs:QName('authorsort'), $v), 'unfiltered') )
for $title in $titles
return element title {
attribute doc-id {$title/isbn13},
$title/booktitle}
}
This approach seems to work fairly well with the element range indexes on our
subject and date taxonomy elements, but is just too slow when it comes to the
authors.
Here's an excerpt from xdmp:query-trace:
xdmp:eval("(: browse testing :) xquery version"1.0-ml";...", (),<options
xmlns="xdmp:eval"><database>7839305530622276384</database><modules>0</modules><def...</options>)
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Analyzing path for search:
collection("abce")/descendant::taxonomy
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Step 1 is searchable:
collection("abce")
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Step 2 is searchable:
descendant::taxonomy
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Path is fully searchable.
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Gathering constraints.
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Step 1 contributed 1 constraint:
collection("abce")
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Step 2 test contributed 1
constraint: taxonomy
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Comparison contributed string range
value constraint: authorsort = "Zimmerman, Joseph F."
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Search query contributed 1 constraint:
cts:element-range-query(QName("", "authorsort"), "=", "Zimmerman, Joseph F.",
("collation=http://marklogic.com/collation/"), 1)
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Executing search.
2009-07-17 05:13:42.584 Info: 8002-research: line 48: Selected 4 fragments
<qm:query-meters xsi:schemaLocation="http://marklogic.com/xdmp/query-meters query-meters.xsd"
xmlns:qm="http://marklogic.com/xdmp/query-meters"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<qm:elapsed-time>PT5.438S</qm:elapsed-time>
<qm:requests>0</qm:requests>
<qm:list-cache-hits>215013</qm:list-cache-hits>
<qm:list-cache-misses>0</qm:list-cache-misses>
<qm:in-memory-list-hits>0</qm:in-memory-list-hits>
<qm:expanded-tree-cache-hits>2205</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>4554</qm:expanded-tree-cache-misses>
<qm:compressed-tree-cache-hits>4554</qm:compressed-tree-cache-hits>
<qm:compressed-tree-cache-misses>0</qm:compressed-tree-cache-misses>
<qm:in-memory-compressed-tree-hits>0</qm:in-memory-compressed-tree-hits>
<qm:value-cache-hits>0</qm:value-cache-hits>
<qm:value-cache-misses>0</qm:value-cache-misses>
...
<qm:document>
<qm:uri>/abce/C9129.xml</qm:uri>
<qm:expanded-tree-cache-hits>0</qm:expanded-tree-cache-hits>
<qm:expanded-tree-cache-misses>1</qm:expanded-tree-cache-misses>
</qm:document>
I seem to be getting a lot of expanded-tree-cache-misses, but I'm not sure how
to correct for that.
Is there a more efficient way to generate our list of authors?
Thanks!
Susan
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general