Will, you could also try https://github.com/mblakele/xqysp
Here are some tests comparing search:parse output with the query-eval module that is included as an example in xqysp. Since you have access to the abstract syntax tree from the parser, you can also customize the cts:query output to suit your needs. Integration with the search API is easy: instead of calling search:search, pass the cts:query to search:resolve. import module namespace qe="com.blakeley.xqysp.query-eval" at "query-eval.xqy"; import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; for $q in ('foo-bar', 'foo - bar', 'foo–bar', 'foo – bar') return element test { attribute query { $q }, search:parse($q), qe:parse($q) } => <test query="foo-bar"> <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text>foo-bar</cts:text> </cts:word-query> <cts:word-query xmlns:cts="http://marklogic.com/cts"> <cts:text xml:lang="en">foo-bar</cts:text> </cts:word-query> </test> <test query="foo - bar"> <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" xmlns:cts="http://marklogic.com/cts"> <cts:word-query qtextref="cts:text"> <cts:text>foo</cts:text> </cts:word-query> <cts:not-query qtextstart="-" strength="40"> <cts:word-query qtextref="cts:text"> <cts:text>bar</cts:text> </cts:word-query> </cts:not-query> </cts:and-query> <cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:word-query> <cts:text xml:lang="en">foo</cts:text> </cts:word-query> <cts:not-query> <cts:word-query> <cts:text xml:lang="en">bar</cts:text> </cts:word-query> </cts:not-query> </cts:and-query> </test> <test query="foo–bar"> <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text>foo–bar</cts:text> </cts:word-query> <cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:word-query> <cts:text xml:lang="en">foo</cts:text> </cts:word-query> <cts:word-query> <cts:text xml:lang="en">bar</cts:text> </cts:word-query> </cts:and-query> </test> <test query="foo – bar"> <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" xmlns:cts="http://marklogic.com/cts"> <cts:word-query qtextref="cts:text"> <cts:text>foo</cts:text> </cts:word-query> <cts:and-query strength="20" qtextjoin="" qtextgroup="( )"> <cts:word-query qtextref="cts:text"> <cts:text>–</cts:text> </cts:word-query> <cts:word-query qtextref="cts:text"> <cts:text>bar</cts:text> </cts:word-query> </cts:and-query> </cts:and-query> <cts:and-query xmlns:cts="http://marklogic.com/cts"> <cts:word-query> <cts:text xml:lang="en">foo</cts:text> </cts:word-query> <cts:word-query> <cts:text xml:lang="en">bar</cts:text> </cts:word-query> </cts:and-query> </test> -- Mike On 26 Jan 2012, at 18:13 , Danny Sokolsky wrote: > Sorry Will, I misunderstood (thought you meant the - was being treated as > negation). > > Since you are pulling those from data that you know is in your database, how > about if you make the whole thing a phrase put surrounding it with quotes. > Here is an example of what I mean: > > xquery version "1.0-ml"; > > import module namespace search = > "http://marklogic.com/appservices/search" > at "/MarkLogic/appservices/search/search.xqy"; > > search:parse('"Venue ― Motion to Transfer"') > > -Danny > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Will Thompson > Sent: Thursday, January 26, 2012 6:01 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? > > Thanks Danny, but I'm not sure I follow. Maybe that was not the best > explanation. Rather than use dashes like hyphens, I just want a search for > something like "Venue ― Motion to Transfer" to ignore the dash when > parsed. It appears to be treating it like a word instead and is not ignored: > > cts:and-query( > (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", > "lang=en"), 1), > cts:word-query("―", ("case-insensitive", "punctuation-insensitive", > "lang=en"), 1), > cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", > "lang=en"), 1), > cts:word-query("to", ("case-insensitive", "punctuation-insensitive", > "lang=en"), 1), > cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", > "lang=en"), 1)), > ()) > > -Will > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Danny Sokolsky > Sent: Thursday, January 26, 2012 5:35 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? > > Hi Will, > > One thing you can do is change your search grammar to use a joiner other than > the negative sign. > > Here is the default grammar: > > http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520 > > -Danny > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Will Thompson > Sent: Thursday, January 26, 2012 4:34 PM > To: General MarkLogic Developer Discussion > Subject: [MarkLogic Dev General] en/em dashes punctuation? > > Our search autocomplete pulls from doc titles, some of which contain en or em > dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to > Transfer" - search:parse parses it into the query, even though > <term-option>punctuation-insensitive</term-option> is included in the <term> > section of the search options node. I thought it may just be getting ignored > when it's evaluated but it's definitely limiting the query. > > I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[. > instance of cts:punctuation] => "- - - ―" > > But is there an exception here (the same way hyphens are always parsed to > negate)? Do I just need to remove these from the query string before calling > search:parse? If there is a cleaner way, that would be great. > > > Best, > > Will > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
