The question posed by Helen got me curious about the performance of
MarkLogic when it comes to dealing with trees built during the query.

I put together a set of 636 pubmed articles (so, a tiny set compared to
the amount Helen is dealing with). This set of articles contains 2847
author elements, 2634 of which are unique.

I used the XQuery I sent to the list and timed the results.  The total
execution time was about 20 seconds!  It looks like 19.7 or so of those
seconds are spent weeding out the unique authors -- it only takes the
server about .3 seconds to build the list of author names and the list
of unique author keys, and the rest of the time is spent looking at
the authors list for those unique author keys.

I timed this against Saxon, on the same machine, where Saxon loaded the
files up from disk.  It took takes Saxon about 1.5 seconds to load the
files, but the amount of time to actually execute the query was only
about .1 to .2 seconds (so under 2 seconds total execution time).

This struck me as odd, I wouldn't have expected this much of a difference.
Since MarkLogic Server was blazingly fast at actually loading the
documents (which makes sense, xdmp:query-meters() shows the documents were
read from cache), I assume the difference is that Saxon is much better
at building an index for the temporary tree -- Is MarkLogic not doing
anything similar?  If not, is there a technique one can use to force it
to?  Is there some other way one should approach a manipulation like this?

I ask because this seemed like a typical sort of problem one might need
to solve in XQuery (when the documents don't have quite as grainular a
view as one needs it seems reasonable to assume one should be able to
build up the grainular representation as part of the query).

<result>{
  let $authors :=
    for $author in 
collection()/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author
    let $surname := data($author/LastName)
    let $fname   := data($author/FirstName)
    let $key     := string-join(($surname,$fname), "|")
    where exists($author/LastName)
    return
      <author key="{$key}">{
        <surname>{$surname}</surname>,
        if ($fname ne '') then <fname>{$fname}</fname> else ()
      }</author>
  let $unique :=
    for $key in distinct-values($authors/@key)
    order by $key
    return $key
  return
    for $key in $unique
    return <author>[EMAIL PROTECTED]/*}</author> (: the dreadfully slow part 
... :)
}</result>


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
James A. Robinson                       [EMAIL PROTECTED]
Stanford University HighWire Press      http://highwire.stanford.edu/
+1 650 7237294 (Work)                   +1 650 7259335 (Fax)
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to