Will, you could also try https://github.com/mblakele/xqysp

Here are some tests comparing search:parse output with the query-eval module 
that is included as an example in xqysp. Since you have access to the abstract 
syntax tree from the parser, you can also customize the cts:query output to 
suit your needs. Integration with the search API is easy: instead of calling 
search:search, pass the cts:query to search:resolve.

import module namespace qe="com.blakeley.xqysp.query-eval" at "query-eval.xqy";
import module namespace search = "http://marklogic.com/appservices/search";
     at "/MarkLogic/appservices/search/search.xqy";

for $q in ('foo-bar', 'foo - bar', 'foo–bar', 'foo – bar')
return element test {
  attribute query { $q },
  search:parse($q),
  qe:parse($q) }
=>
<test query="foo-bar">
  <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts";>
    <cts:text>foo-bar</cts:text>
  </cts:word-query>
  <cts:word-query xmlns:cts="http://marklogic.com/cts";>
    <cts:text xml:lang="en">foo-bar</cts:text>
  </cts:word-query>
</test>
<test query="foo - bar">
  <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" 
xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query qtextref="cts:text">
      <cts:text>foo</cts:text>
    </cts:word-query>
    <cts:not-query qtextstart="-" strength="40">
      <cts:word-query qtextref="cts:text">
        <cts:text>bar</cts:text>
      </cts:word-query>
    </cts:not-query>
  </cts:and-query>
  <cts:and-query xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query>
      <cts:text xml:lang="en">foo</cts:text>
    </cts:word-query>
    <cts:not-query>
      <cts:word-query>
        <cts:text xml:lang="en">bar</cts:text>
      </cts:word-query>
    </cts:not-query>
  </cts:and-query>
</test>
<test query="foo–bar">
  <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts";>
    <cts:text>foo–bar</cts:text>
  </cts:word-query>
  <cts:and-query xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query>
      <cts:text xml:lang="en">foo</cts:text>
    </cts:word-query>
    <cts:word-query>
      <cts:text xml:lang="en">bar</cts:text>
    </cts:word-query>
  </cts:and-query>
</test>
<test query="foo – bar">
  <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" 
xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query qtextref="cts:text">
      <cts:text>foo</cts:text>
    </cts:word-query>
    <cts:and-query strength="20" qtextjoin="" qtextgroup="( )">
      <cts:word-query qtextref="cts:text">
        <cts:text>–</cts:text>
      </cts:word-query>
      <cts:word-query qtextref="cts:text">
        <cts:text>bar</cts:text>
      </cts:word-query>
    </cts:and-query>
  </cts:and-query>
  <cts:and-query xmlns:cts="http://marklogic.com/cts";>
    <cts:word-query>
      <cts:text xml:lang="en">foo</cts:text>
    </cts:word-query>
    <cts:word-query>
      <cts:text xml:lang="en">bar</cts:text>
    </cts:word-query>
  </cts:and-query>
</test>

-- Mike

On 26 Jan 2012, at 18:13 , Danny Sokolsky wrote:

> Sorry Will, I misunderstood (thought you meant the - was being treated as 
> negation).
> 
> Since you are pulling those from data that you know is in your database, how 
> about if you make the whole thing a phrase put surrounding it with quotes.  
> Here is an example of what I mean:
> 
> xquery version "1.0-ml";
> 
> import module namespace search = 
>  "http://marklogic.com/appservices/search";
>  at "/MarkLogic/appservices/search/search.xqy";
> 
> search:parse('"Venue &#x2015; Motion to Transfer"')
> 
> -Danny
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Will Thompson
> Sent: Thursday, January 26, 2012 6:01 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?
> 
> Thanks Danny, but I'm not sure I follow. Maybe that was not the best 
> explanation. Rather than use dashes like hyphens, I just want a search for 
> something like "Venue &#x2015; Motion to Transfer" to ignore the dash when 
> parsed. It appears to be treating it like a word instead and is not ignored:
> 
> cts:and-query(
>  (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", 
> "lang=en"), 1),
>   cts:word-query("&#x2015;", ("case-insensitive", "punctuation-insensitive", 
> "lang=en"), 1),
>   cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", 
> "lang=en"), 1),
>   cts:word-query("to", ("case-insensitive", "punctuation-insensitive", 
> "lang=en"), 1),
>   cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", 
> "lang=en"), 1)),
>  ())
> 
> -Will
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Danny Sokolsky
> Sent: Thursday, January 26, 2012 5:35 PM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?
> 
> Hi Will,
> 
> One thing you can do is change your search grammar to use a joiner other than 
> the negative sign.
> 
> Here is the default grammar:
> 
> http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520
> 
> -Danny
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Will Thompson
> Sent: Thursday, January 26, 2012 4:34 PM
> To: General MarkLogic Developer Discussion
> Subject: [MarkLogic Dev General] en/em dashes punctuation?
> 
> Our search autocomplete pulls from doc titles, some of which contain en or em 
> dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to 
> Transfer" - search:parse parses it into the query, even though 
> <term-option>punctuation-insensitive</term-option> is included in the <term> 
> section of the search options node. I thought it may just be getting ignored 
> when it's evaluated but it's definitely limiting the query.
> 
> I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[. 
> instance of cts:punctuation] => "- - - ―"
> 
> But is there an exception here (the same way hyphens are always parsed to 
> negate)? Do I just need to remove these from the query string before calling 
> search:parse? If there is a cleaner way, that would be great.
> 
> 
> Best,
> 
> Will
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to