Why not try a punctuation-insensitive and case-insensitive element-value query using "smith john"?
_____ From: [email protected] [mailto:[email protected]] On Behalf Of Dave Feldmeier Sent: Thursday, December 11, 2008 12:47 PM To: [email protected] Subject: [MarkLogic Dev General] Re: General Digest, Vol 54, Issue 15 Mike, "(AUTHOR:john AUTHOR:smith)" is doing what I expect, but I need a different query in addition to this one. Let me give you some context. Suppose that we have two documents: <DOCUMENT> <AUTHOR>smith, john</AUTHOR> </DOCUMENT <DOCUMENT> <AUTHOR>jones, john</AUTHOR> <AUTHOR>smith, steve</AUTHOR> </DOCUMENT The problem with the query above is that it will match both documents, but I want a query that matches only the first document. In other words, I want the AND to be performed within any single AUTHOR field. The obvious syntax for this is: "AUTHOR:(john smith)". However, lib-parse appears to parse this as if the query were as follows: 'AUTHOR:"" (john smith)' - in other words, it's looking for an AUTHOR field with "" (and there is none), so the entire query returns zero results. What I expected it to do is something like: (cts:element-word-query(QName("AUTHOR"), cts:and-query(cts:word-query("john"), cts:word-query("smith")))) When I run the first query against my database, I get 7210 results. When I run the second query against my database, I get 2691 results, which is exactly what I'd expect, because the second query returns fewer results than the first query. Here's the actual query: for $i in xdmp:estimate(cts:search(input(), cts:element-query(xs:QName("AUTHOR"), cts:and-query( cts:and-query( ( cts:word-query("john"), cts:word-query("smith") ) ) ) ) )) return $ This seems like such an obvious thing to do that I can't believe that I'm the first one to do it, so I was hoping that someone else already had implemented something similar. I'm using MarkLogic 4.0-2.2 and lib-search-3.2-2008-05-13.1 (it looks like you're using a newer version - where do I get it?). My lib-parser-custom.xqy has some additional code to deal with range queries (appended below). At first glance, it shouldn't have an impact. -Dave 23a24,25 > import module namespace map = <http://sirma.marklogic.com/lib/map> "http://sirma.marklogic.com/lib/map" at "/lib/map.xqy" > 55a58 > (: 82a86 > :) 114a119,210 > define function custom:hasRange($text) > { > if( contains($text,"~>") or > contains($text,"~<") or > contains($text,"~=") or > contains($text,"~>=") or > contains($text,"~<=") ) then true() > else false() > } > > (: [email protected] :) > define function custom:rangeQuery($qname,$text,$type) > { > let $tokens := tokenize($text,"~") > let $optr := $tokens[2] > let $val := custom:castToDataType($tokens[1],$type) > return cts:element-range-query(xs:QName($qname),$optr,$val, element cts:option { "collation=http://marklogic.com/collation//MO <http://marklogic.com/collation/MO> " }) > } > > define function custom:element-value-query( > $qnames as xs:QName*, > $text as xs:string*, > $options as xs:string*, > $weight as xs:double > ) { > let $queries := > for $i in $qnames > let $mapping := $map:map//*:mappi...@qname = $i] > let $parent := string($mapping/@parent) > let $range := string($mapping/@isRange) > let $type := string($mapping/@type) > return > if($range and custom:hasRange($text)) then custom:rangeQuery($i,$text,$type) > else if($parent) then > cts:element-query(xs:QName($parent), cts:word-query($text, $options, $weight)) > else > cts:element-value-query(xs:QName($i), $text, $options, $weight) > return > if(count($queries) gt 1) then cts:or-query($queries) else $queries > } > > define function custom:element-word-query( > $qnames as xs:QName*, > $text as xs:string*, > $options as xs:string*, > $weight as xs:double > ) { > let $queries := > for $i in $qnames > let $mapping := $map:map//*:mappi...@qname = $i] > let $parent := string($mapping/@parent) > let $range := string($mapping/@isRange) > let $type := string($mapping/@type) > return > if($range and custom:hasRange($text)) then custom:rangeQuery($i,$text,$type) > else > cts:element-query(xs:QName($i), cts:word-query($text, $options, $weight)) > (:if($parent) then > else > cts:element-word-query(xs:QName($i), $text, $options, $weight):) > return > if(count($queries) gt 1) then cts:or-query($queries) else $queries > } > > define function custom:element-query($qnames, $queries, $options){ > let $queries := > for $i in $qnames > return > cts:element-query(xs:QName($i), $queries, $options) > return > if(count($queries) gt 1) then cts:or-query($queries) else $queries > } > > define function custom:castToDataType($value,$type) > { > try > { > let $value := replace($value,'"','') > return > if($type="dateTime") then xs:dateTime(xs:date($value)) else > if($type="unsignedLong") then xs:unsignedLong($value) else > if($type="int") then xs:int($value) else > if($type="unsignedInt") then xs:unsignedInt($value) else > if($type="") then $value else > lp:error( concat("Unknown type",":"),$value) > } > catch($ex) > { > lp:error(concat("Invalid dataType",":"),$value) > } > } > Message: 6 Date: Thu, 11 Dec 2008 08:44:57 -0800 From: Michael Blakeley <mailto:[email protected]> <[email protected]> Subject: Re: [MarkLogic Dev General] lib-parse - how to search for boolean expressions within a field To: General Mark Logic Developer Discussion <mailto:[email protected]> <[email protected]> Message-ID: <mailto:[email protected]> <[email protected]> Content-Type: text/plain; charset=UTF-8; format=flowed lib-parser tries to emulate google's syntax. It does not implement "AUTHOR:(john AND smith)" because that isn't google syntax. You are, of course, free to write your own parser, and you can even use the apache-licensed lib-parser.xqy code as a starting point. But first let's look into "(AUTHOR:john AUTHOR:smith)" some more. I suspect that you might be tickling a bug or misunderstanding an option (or perhaps you had an extra space before "john"?). Here's my test, using MarkLogic Server 4.0-2.2 and lib-parser version 3.2-2008-10-08.1 with the built-in code mapping: import module namespace lp= <http://www.marklogic.com/ps/lib/lib-parser> "http://www.marklogic.com/ps/lib/lib-parser" at "lib-parser.xqy"; lp:get-cts-query('title:foo title:bar') => cts:and-query((cts:element-word-query(QName("", "title"), "foo", ("lang=en"), 1), cts:element-word-query(QName("", "title"), "bar", ("lang=en"), 1)), ()) That's what I'd expect: the output is an and-query of element-word-query terms (*not* element-query). Note that I omitted AND, because it's a no-op (as with google's syntax). From the collection-query in your sample output, it's clear that your test case must be more complex than mine. Can you provide a full test case? What result did you expect, and what are you getting? Which version of the server are you using? What version of lib-parser.xqy do you have? Have you made any changes to lib-parser-custom.xqy? -- Mike On 2008-12-11 00:07, Dave Feldmeier wrote: I am using lib-parse, and I can do searches like "(AUTHOR: john AND AUTHOR:smith)". However, this doesn't do the right thing if there are multiple authors. I want to do a search like "AUTHOR:(john AND smith)", but lib-parse does not seem to construct the correct expression for this search. It looks like lib-parse is creating a and-query that combines an element-query with an empty word query and a second and-query of two word-queries, rather than creating an element-query with an and-query that contains two word-queries (see the attached XML of the query). Since the recursive parser for boolean expressions already exists in lib-parse, is there an easy way to fix lib-parse to create the correct search expression when doing a search with boolean expressions within a field? Thanks. -Dave <lib-query>AUTHOR:(john AND smith)</lib-query> <query> <cts:and-query xmlns:cts= <http://marklogic.com/cts> "http://marklogic.com/cts"> <cts:and-query> <cts:element-query> <cts:element>AUTHOR</cts:element> <cts:word-query /> </cts:element-query> <cts:and-query> <cts:word-query> <cts:text xml:lang="en">john</cts:text> </cts:word-query> <cts:word-query> <cts:text xml:lang="en">smith</cts:text> </cts:word-query> </cts:and-query> </cts:and-query> <cts:collection-query> <cts:uri>HEAD</cts:uri> </cts:collection-query> </cts:and-query> </query>
_______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
