Hi folks,
I have developed an analysis tool in which I am using a lexicon for book titles to obtain quick frequency counts, to perform lexicon searches with cts:element-value-match(), and then to perform a cts:search() using the cts:element-value-query() function for each resulting book title in the lexicon. The problem is that the frequency counts for each book title do not match the counts of occurrences of that book title in the cts:search(). The first book title lexicon was configured with the collation "http://marklogic.com/collation//S1/AS/T00BB." I wrote the following xquery to demonstrate the problem: xquery version "1.0-ml"; let $title := "Some Title" let $matches := cts:element-value-match(xs:QName("BookTitle"), $title, ( "collation=http://marklogic.com/collation//S1/AS/T00BB", (:"collation=http://marklogic.com/collation/",:) "case-insensitive" , "diacritic-insensitive", "item-order", "ascending" )) let $occurences := count ($matches) for $match in $matches let $results := cts:search(xdmp:directory("/DirectoryURI/","infinity")/Record, cts:element-value-query(xs:QName("BookTitle"), $title, ("case-insensitive", "punctuation-insensitive", "diacritic-insensitive"))) let $count := count($results) let $remainder := cts:remainder($results[1]) return element match { attribute frequency {cts:frequency($match)}, attribute count {$count}, attribute remainder {$remainder}, $match } Here are the results: <match frequency="981" count="1003" remainder="1003">Some Title</match> I built another lexicon using the root collation and after it had reindexed, I obtained the following results (using the root collation in the cts:element-value-match() lexicon search function options): <match frequency="20" count="1003" remainder="1003">Some Title</match> Wow, what a difference the collation makes! I'm a little perplexed as to how to "make" the frequency count match up with the actual number of occurrences and how to adjust the collation and lexicon search so that it yields the same number of results as cts:search() with cts:element-value-query(). I have reviewed the collation concepts at http://userguide.icu-project.org/collation/concepts but I can't quite determine what to do to ensure that the counts line up. Tim Meagher - AAOM Consulting
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
