Hi folks,

 

I have developed an analysis tool in which I am using a lexicon for book
titles to obtain quick frequency counts, to perform lexicon searches with
cts:element-value-match(), and then to perform a cts:search() using the
cts:element-value-query() function for each resulting book title in the
lexicon.  The problem is that the frequency counts for each book title do
not match the counts of occurrences of that book title in the cts:search().
The first book title lexicon was configured with the collation
"http://marklogic.com/collation//S1/AS/T00BB."; I wrote the following xquery
to demonstrate the problem:

 

xquery version "1.0-ml";

let $title := "Some Title"

let $matches := cts:element-value-match(xs:QName("BookTitle"), $title,
    (
        "collation=http://marklogic.com/collation//S1/AS/T00BB";, 
        (:"collation=http://marklogic.com/collation/",:)
        "case-insensitive" , "diacritic-insensitive", "item-order",
"ascending"
    ))

let $occurences := count ($matches)

for  $match in $matches
    
    let $results :=
cts:search(xdmp:directory("/DirectoryURI/","infinity")/Record, 
                cts:element-value-query(xs:QName("BookTitle"), $title,
("case-insensitive", "punctuation-insensitive", "diacritic-insensitive")))
    let $count := count($results)
    
    let $remainder := cts:remainder($results[1])
    
    return element match {
        attribute frequency {cts:frequency($match)},
        attribute count {$count},
        attribute remainder {$remainder},
        $match
   }



Here are the results:


<match frequency="981" count="1003" remainder="1003">Some Title</match>



I built another lexicon using the root collation and after it had reindexed,
I obtained the following results (using the root collation in the
cts:element-value-match() lexicon search function options):

 

<match frequency="20" count="1003" remainder="1003">Some Title</match>



Wow, what a difference the collation makes!  I'm a little perplexed as to
how to "make" the frequency count match up with the actual number of
occurrences and how to adjust the collation and lexicon search so that it
yields the same number of results as cts:search() with
cts:element-value-query().  I have reviewed the collation concepts at
http://userguide.icu-project.org/collation/concepts but I can't quite
determine what to do to ensure that the counts line up.

 

Tim Meagher - AAOM Consulting

 

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to