Hi Helen,

Have you tried using value lexicons to do this?  Value lexicons use
range indexes and the cts:element-values and cts:element-value-match
APIs.  There are several ways you might accomplish this with lexicons.

For example, you could create 3 string range indexes, one on each of the
surname, fname, and midname elements.  This will likely be much faster
than doing the distinct values. You could use cts:element-value-match to
get the a*, b*, etc functionality too.

If you need to do the whole string-joined name combining the 3 elements
thing, you might be able to (depending on whether there is complexity to
the author element that you did not mention) build a lexicon (string
range index) on the author element.  This will give you the values
equivalent to doing an fn:data of the author element.  Your query for
the list of unique authors would then be something like:

cts:element-values(xs:QName("author"))

You could also use the 3.2 frequency feature to find how many of each
there are.  Lexicons are very cool. 

-Danny

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Helen Chen
Sent: Thursday, July 05, 2007 2:04 PM
To: [email protected]
Subject: [MarkLogic Dev General] performance question


I'm trying to create an author index list for the articles in the whole
volume, which means we may have 20000 articles in one volume, and each
article has more than one author. The author name element structure is :

<author><surname>..</surname><fname>..</fname><midname>..</midname>
the number of midname can be 0 or more than 1.

since this is the index, the result will include all the articles under
the volume directory, and the important step for me is to create unique
author name list.  but I found this takes very long time:

   let $article := cts:search(/article, 
cts:directory-query("/journal/coden/vol/","infinity"))
   let $author := $article/front/authgrp/author

   for $surname in distinct-values($author/surname),
        $fname in distinct-values( $author[surname=$surname]/fname ),
        $midname in distinct-values( if(fn:exists($author/middlename)  )
                                                     then
 
fn:string-join( for
$m in $author[surname = $surname and fname = $fname]/middlename   return
$m/text(),       " ")
                                                     else ()
                                                   )
   return
      <author>
        <surname>{ $surname }</surname>
        <fname>{ $fname }</fname>
         {
           if(fn:empty($midname)) then ()
           else  <midname>{$midname}</midname>
        }
  </author>

So I'm thinking that I can break the surname with starting letters, I
did the following logic: I loop through 26 letters go get result, the
problem is: if just for one letter, it is kind of quick (still about 10
seconds), but with 26 letters, it somehow takes about 8 minutes, it is
much better than  the first solution, but it is still too long for me.

<result>
{
   let $article := cts:search(/article, 
cts:directory-query("/journal/APPLAB/vol_89/","infinity") )

   for $letter in ("a","b","c","d","e","f","g","h","i","j","k","l","m",
                 "n","o","p","q","r","s","t","u","v","w","x","y","z")
   let $author := $article/front/authgrp/author[fn:starts-with(surname,
$letter, "http://marklogic.com/collation//S2"; )]
   return
    for $surname in distinct-values($author/surname),
          $fname in distinct-values( $author[surname=$surname]/fname ),
          $midname in distinct-values( if(fn:exists($author/middlename)
)
                                       then
                                         fn:string-join(for $m in
$author[surname = $surname and fname = $fname]/middlename
                                                return $m/text(),
                                                " ")
                                       else ()
                                      )
       return
       <author>
        <surname>{ $surname }</surname>
        <fname>{ $fname }</fname>
        {
           if(fn:empty($midname)) then ()
           else  <midname>{$midname}</midname>
        }
       </author>
   
}
</result>


Does anyone have suggestions how I should deal with it?

also another small problem, I prefer if no midname, no output, but this
code print out the empty node for midname is no midname exists. Can
someone tell me how to avoid printing out the empty midname?


Thanks, Helen
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to