For the empty word-query, the search api allows you to configure the behavior to be no-results (the default) or all-results. With all-results, and empty term will give an empty and-query, which is defined to match everything. For example:
import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; search:parse("", <search:options xmlns="http://marklogic.com/appservices/search"><term> <empty apply="no-results" /> <term-option>diacritic-insensitive</term-option> <term-option>unwildcarded</term-option> </term></search:options>) Now if you have punctuation in there, that is not an empty term, so I am not sure why you are seeing an empty word-query for that. For example, the following: search:parse("+") returns <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> <cts:text>+</cts:text> </cts:word-query> So maybe I am not understanding you? -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Will Thompson Sent: Friday, January 27, 2012 12:45 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? Danny - Yes, a good alternative to using replace(). I'm not sure if this sounds like a reasonable feature request, but I have some beef with the way search:parse behaves. First, I didn't realize that an empty word query - cts:word-query("") - would return zero results; I assumed it would return everything. But given that, if you search:parse with the case-insensitive option, then any string input with floating punctuation will return a cts:word-query("&punctuation;",("case-insensitive")), equivalent to the empty word query and and-ed with the rest of the parsed query will always return nothing. It's an edge case, but seems undesirable in any scenario. -Will -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Friday, January 27, 2012 10:25 AM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? Just a thought here: Since those values are coming out of a lexicon (I am assuming), maybe your javascript code that displays it in the browser can remove the unwanted characters (and maybe lower-case them?) before it gives the suggestion to the ui? That way people can still search for those characters by typing them in. -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Will Thompson Sent: Thursday, January 26, 2012 7:20 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? Okay, very cool. Thanks. -Will -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Thursday, January 26, 2012 7:17 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? The initial release did not include query-eval for direct cts:query output. I added that a bit later, to make it easier to get started. But the default query-eval module is still meant to be modified and customized for your application. It takes some shortcuts: for example, it assumes that foo=bar maps to cts:element-value-query(xs:QName('foo'), 'bar'). A sophisticated application might use a lookup table to map codes to QNames, and possibly mix in other options like collation, language, case-sensitivity, etc. If you customize query-eval in interesting or useful ways, I am open to adding more sample evaluators to github. -- Mike On 26 Jan 2012, at 19:04 , Will Thompson wrote: > Thanks Mike - Your parser was on my radar, but I did not realize it returns > ML query syntax (I thought you had to DIY to get it from your AST to ML). > > As a quick fix I may have to just > replace($querystring,"&endash;|&emdash;,""), but I will definitely give xqysp > a second look. > > -Will > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Michael Blakeley > Sent: Thursday, January 26, 2012 6:46 PM > To: General MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? > > Will, you could also try https://github.com/mblakele/xqysp > > Here are some tests comparing search:parse output with the query-eval module > that is included as an example in xqysp. Since you have access to the > abstract syntax tree from the parser, you can also customize the cts:query > output to suit your needs. Integration with the search API is easy: instead > of calling search:search, pass the cts:query to search:resolve. > > import module namespace qe="com.blakeley.xqysp.query-eval" at > "query-eval.xqy"; > import module namespace search = "http://marklogic.com/appservices/search" > at "/MarkLogic/appservices/search/search.xqy"; > > for $q in ('foo-bar', 'foo - bar', 'foo–bar', 'foo – bar') > return element test { > attribute query { $q }, > search:parse($q), > qe:parse($q) } > => > <test query="foo-bar"> > <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> > <cts:text>foo-bar</cts:text> > </cts:word-query> > <cts:word-query xmlns:cts="http://marklogic.com/cts"> > <cts:text xml:lang="en">foo-bar</cts:text> > </cts:word-query> > </test> > <test query="foo - bar"> > <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" > xmlns:cts="http://marklogic.com/cts"> > <cts:word-query qtextref="cts:text"> > <cts:text>foo</cts:text> > </cts:word-query> > <cts:not-query qtextstart="-" strength="40"> > <cts:word-query qtextref="cts:text"> > <cts:text>bar</cts:text> > </cts:word-query> > </cts:not-query> > </cts:and-query> > <cts:and-query xmlns:cts="http://marklogic.com/cts"> > <cts:word-query> > <cts:text xml:lang="en">foo</cts:text> > </cts:word-query> > <cts:not-query> > <cts:word-query> > <cts:text xml:lang="en">bar</cts:text> > </cts:word-query> > </cts:not-query> > </cts:and-query> > </test> > <test query="foo–bar"> > <cts:word-query qtextref="cts:text" xmlns:cts="http://marklogic.com/cts"> > <cts:text>foo–bar</cts:text> > </cts:word-query> > <cts:and-query xmlns:cts="http://marklogic.com/cts"> > <cts:word-query> > <cts:text xml:lang="en">foo</cts:text> > </cts:word-query> > <cts:word-query> > <cts:text xml:lang="en">bar</cts:text> > </cts:word-query> > </cts:and-query> > </test> > <test query="foo – bar"> > <cts:and-query strength="20" qtextjoin="" qtextgroup="( )" > xmlns:cts="http://marklogic.com/cts"> > <cts:word-query qtextref="cts:text"> > <cts:text>foo</cts:text> > </cts:word-query> > <cts:and-query strength="20" qtextjoin="" qtextgroup="( )"> > <cts:word-query qtextref="cts:text"> > <cts:text>–</cts:text> > </cts:word-query> > <cts:word-query qtextref="cts:text"> > <cts:text>bar</cts:text> > </cts:word-query> > </cts:and-query> > </cts:and-query> > <cts:and-query xmlns:cts="http://marklogic.com/cts"> > <cts:word-query> > <cts:text xml:lang="en">foo</cts:text> > </cts:word-query> > <cts:word-query> > <cts:text xml:lang="en">bar</cts:text> > </cts:word-query> > </cts:and-query> > </test> > > -- Mike > > On 26 Jan 2012, at 18:13 , Danny Sokolsky wrote: > >> Sorry Will, I misunderstood (thought you meant the - was being treated as >> negation). >> >> Since you are pulling those from data that you know is in your database, how >> about if you make the whole thing a phrase put surrounding it with quotes. >> Here is an example of what I mean: >> >> xquery version "1.0-ml"; >> >> import module namespace search = >> "http://marklogic.com/appservices/search" >> at "/MarkLogic/appservices/search/search.xqy"; >> >> search:parse('"Venue ― Motion to Transfer"') >> >> -Danny >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Will Thompson >> Sent: Thursday, January 26, 2012 6:01 PM >> To: General MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? >> >> Thanks Danny, but I'm not sure I follow. Maybe that was not the best >> explanation. Rather than use dashes like hyphens, I just want a search for >> something like "Venue ― Motion to Transfer" to ignore the dash when >> parsed. It appears to be treating it like a word instead and is not ignored: >> >> cts:and-query( >> (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", >> "lang=en"), 1), >> cts:word-query("―", ("case-insensitive", "punctuation-insensitive", >> "lang=en"), 1), >> cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", >> "lang=en"), 1), >> cts:word-query("to", ("case-insensitive", "punctuation-insensitive", >> "lang=en"), 1), >> cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", >> "lang=en"), 1)), >> ()) >> >> -Will >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Danny Sokolsky >> Sent: Thursday, January 26, 2012 5:35 PM >> To: General MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? >> >> Hi Will, >> >> One thing you can do is change your search grammar to use a joiner other >> than the negative sign. >> >> Here is the default grammar: >> >> http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520 >> >> -Danny >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Will Thompson >> Sent: Thursday, January 26, 2012 4:34 PM >> To: General MarkLogic Developer Discussion >> Subject: [MarkLogic Dev General] en/em dashes punctuation? >> >> Our search autocomplete pulls from doc titles, some of which contain en or >> em dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to >> Transfer" - search:parse parses it into the query, even though >> <term-option>punctuation-insensitive</term-option> is included in the <term> >> section of the search options node. I thought it may just be getting ignored >> when it's evaluated but it's definitely limiting the query. >> >> I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[. >> instance of cts:punctuation] => "- - - ―" >> >> But is there an exception here (the same way hyphens are always parsed to >> negate)? Do I just need to remove these from the query string before calling >> search:parse? If there is a cleaner way, that would be great. >> >> >> Best, >> >> Will >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
