Why not try a punctuation-insensitive and case-insensitive element-value
query using "smith john"?

 

  _____  

From: [email protected]
[mailto:[email protected]] On Behalf Of Dave Feldmeier
Sent: Thursday, December 11, 2008 12:47 PM
To: [email protected]
Subject: [MarkLogic Dev General] Re: General Digest, Vol 54, Issue 15

 

Mike,

"(AUTHOR:john AUTHOR:smith)" is doing what I expect, but I need a different
query in addition to this one. Let me give you some context. Suppose that we
have two documents:

<DOCUMENT>
    <AUTHOR>smith, john</AUTHOR>
</DOCUMENT

<DOCUMENT>
    <AUTHOR>jones, john</AUTHOR>
    <AUTHOR>smith, steve</AUTHOR>
</DOCUMENT

The problem with the query above is that it will match both documents, but I
want a query that matches only the first document. In other words, I want
the AND to be performed within any single AUTHOR field.

The obvious syntax for this is: "AUTHOR:(john smith)". However, lib-parse
appears to parse this as if the query were as follows: 'AUTHOR:"" (john
smith)' - in other words, it's looking for an AUTHOR field with "" (and
there is none), so the entire query returns zero results.

What I expected it to do is something like:

(cts:element-word-query(QName("AUTHOR"),
cts:and-query(cts:word-query("john"), cts:word-query("smith"))))

When I run the first query against my database, I get 7210 results. When I
run the second query against my database, I get 2691 results, which is
exactly what I'd expect, because the second query returns fewer results than
the first query.

 Here's the actual query:

for $i in xdmp:estimate(cts:search(input(),
cts:element-query(xs:QName("AUTHOR"),
cts:and-query(
cts:and-query(
(
cts:word-query("john"),
cts:word-query("smith")
)
)
)
)
))
return $


This seems like such an obvious thing to do that I can't believe that I'm
the first one to do it, so I was hoping that someone else already had
implemented something similar.

I'm using MarkLogic 4.0-2.2 and lib-search-3.2-2008-05-13.1 (it looks like
you're using a newer version - where do I get it?). My lib-parser-custom.xqy
has some additional code to deal with range queries (appended below). At
first glance, it shouldn't have an impact.

                                           -Dave

23a24,25
> import module namespace map =  <http://sirma.marklogic.com/lib/map>
"http://sirma.marklogic.com/lib/map"; at "/lib/map.xqy"
>
55a58
> (:
82a86
> :)
114a119,210
> define function custom:hasRange($text)
> {
>   if( contains($text,"~>") or
>       contains($text,"~<") or
>       contains($text,"~=") or
>       contains($text,"~>=") or
>       contains($text,"~<=")  ) then true()
>   else false()
> }
>
> (: [email protected] :)
> define function custom:rangeQuery($qname,$text,$type)
> {
>       let $tokens := tokenize($text,"~")
>       let $optr := $tokens[2]
>       let $val := custom:castToDataType($tokens[1],$type)
>       return cts:element-range-query(xs:QName($qname),$optr,$val, element
cts:option { "collation=http://marklogic.com/collation//MO
<http://marklogic.com/collation/MO> " })
> }
>
> define function custom:element-value-query(
>   $qnames as xs:QName*,
>   $text as xs:string*,
>   $options as xs:string*,
>   $weight as xs:double
> ) {
>     let $queries :=
>         for $i in $qnames
>         let $mapping := $map:map//*:mappi...@qname = $i]
>         let $parent := string($mapping/@parent)
>               let $range := string($mapping/@isRange)
>               let $type := string($mapping/@type)
>         return
>           if($range and custom:hasRange($text)) then
custom:rangeQuery($i,$text,$type)
>             else if($parent) then
>                 cts:element-query(xs:QName($parent), cts:word-query($text,
$options, $weight))
>             else
>                 cts:element-value-query(xs:QName($i), $text, $options,
$weight)
>     return
>         if(count($queries) gt 1) then cts:or-query($queries) else $queries
> }
>
> define function custom:element-word-query(
>   $qnames as xs:QName*,
>   $text as xs:string*,
>   $options as xs:string*,
>   $weight as xs:double
> ) {
>     let $queries :=
>         for $i in $qnames
>         let $mapping := $map:map//*:mappi...@qname = $i]
>         let $parent := string($mapping/@parent)
>               let $range := string($mapping/@isRange)
>               let $type := string($mapping/@type)
>         return
>             if($range and custom:hasRange($text)) then
custom:rangeQuery($i,$text,$type)
>           else
>             cts:element-query(xs:QName($i), cts:word-query($text,
$options, $weight))
>             (:if($parent) then
>             else
>                 cts:element-word-query(xs:QName($i), $text, $options,
$weight):)
>     return
>         if(count($queries) gt 1) then cts:or-query($queries) else $queries
> }
>
> define function custom:element-query($qnames, $queries, $options){
>     let $queries :=
>         for $i in $qnames
>         return
>             cts:element-query(xs:QName($i), $queries, $options)
>     return
>         if(count($queries) gt 1) then cts:or-query($queries) else $queries
> }
>
> define function custom:castToDataType($value,$type)
> {
>  try
>  {
>    let $value := replace($value,'"','')
>    return
>    if($type="dateTime") then xs:dateTime(xs:date($value)) else
>    if($type="unsignedLong") then xs:unsignedLong($value) else
>    if($type="int") then xs:int($value) else
>    if($type="unsignedInt") then xs:unsignedInt($value) else
>    if($type="") then $value else
>    lp:error( concat("Unknown type",":"),$value)
>  }
>  catch($ex)
>  {
>    lp:error(concat("Invalid dataType",":"),$value)
>  }
> }
>




 
Message: 6
Date: Thu, 11 Dec 2008 08:44:57 -0800
From: Michael Blakeley  <mailto:[email protected]>
<[email protected]>
Subject: Re: [MarkLogic Dev General] lib-parse - how to search for
        boolean expressions within a field
To: General Mark Logic Developer Discussion
         <mailto:[email protected]>
<[email protected]>
Message-ID:  <mailto:[email protected]>
<[email protected]>
Content-Type: text/plain; charset=UTF-8; format=flowed
 
lib-parser tries to emulate google's syntax. It does not implement 
"AUTHOR:(john AND smith)" because that isn't google syntax. You are, of 
course, free to write your own parser, and you can even use the 
apache-licensed lib-parser.xqy code as a starting point.
 
But first let's look into "(AUTHOR:john AUTHOR:smith)" some more. I 
suspect that you might be tickling a bug or misunderstanding an option 
(or perhaps you had an extra space before "john"?). Here's my test, 
using MarkLogic Server 4.0-2.2 and lib-parser version 3.2-2008-10-08.1 
with the built-in code mapping:
 
import module namespace lp= <http://www.marklogic.com/ps/lib/lib-parser>
"http://www.marklogic.com/ps/lib/lib-parser";
   at "lib-parser.xqy";
lp:get-cts-query('title:foo title:bar')
=>
cts:and-query((cts:element-word-query(QName("", "title"), "foo", 
("lang=en"), 1), cts:element-word-query(QName("", "title"), "bar", 
("lang=en"), 1)), ())
 
That's what I'd expect: the output is an and-query of element-word-query 
terms (*not* element-query). Note that I omitted AND, because it's a 
no-op (as with google's syntax).
 
 From the collection-query in your sample output, it's clear that your 
test case must be more complex than mine. Can you provide a full test 
case? What result did you expect, and what are you getting? Which 
version of the server are you using? What version of lib-parser.xqy do 
you have? Have you made any changes to lib-parser-custom.xqy?
 
-- Mike
 
On 2008-12-11 00:07, Dave Feldmeier wrote:
  

I am using lib-parse, and I can do searches like "(AUTHOR: john AND
AUTHOR:smith)". However, this doesn't do the right thing if there are
multiple authors. I want to do a search like "AUTHOR:(john AND smith)",
but lib-parse does not seem to construct the correct expression for this
search. It looks like lib-parse is creating a and-query that combines an
element-query with an empty word query and a second and-query of two
word-queries, rather than creating an element-query with an and-query
that contains two word-queries (see the attached XML of the query).
 
Since the recursive parser for boolean expressions already exists in
lib-parse, is there an easy way to fix lib-parse to create the correct
search expression when doing a search with boolean expressions within a
field? Thanks.
 
                                               -Dave
 
 
<lib-query>AUTHOR:(john AND smith)</lib-query>
<query>
     <cts:and-query xmlns:cts= <http://marklogic.com/cts>
"http://marklogic.com/cts";>
         <cts:and-query>
             <cts:element-query>
                 <cts:element>AUTHOR</cts:element>
                 <cts:word-query />
             </cts:element-query>
             <cts:and-query>
                 <cts:word-query>
                     <cts:text xml:lang="en">john</cts:text>
                 </cts:word-query>
                 <cts:word-query>
                     <cts:text xml:lang="en">smith</cts:text>
                 </cts:word-query>
             </cts:and-query>
         </cts:and-query>
         <cts:collection-query>
             <cts:uri>HEAD</cts:uri>
         </cts:collection-query>
     </cts:and-query>
</query>
    

 

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to