A lot has been said already in this thread, but what strikes me is why the string is tokenized in the first place. Why not just sent through as a single phrase to a single cts:word-query? Then it would just be ignored, without negative side-effects..
Grtz -----Oorspronkelijk bericht----- Van: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] Namens Will Thompson Verzonden: vrijdag 27 januari 2012 3:01 Aan: General MarkLogic Developer Discussion Onderwerp: Re: [MarkLogic Dev General] en/em dashes punctuation? Thanks Danny, but I'm not sure I follow. Maybe that was not the best explanation. Rather than use dashes like hyphens, I just want a search for something like "Venue ― Motion to Transfer" to ignore the dash when parsed. It appears to be treating it like a word instead and is not ignored: cts:and-query( (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", "lang=en"), 1), cts:word-query("―", ("case-insensitive", "punctuation-insensitive", "lang=en"), 1), cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", "lang=en"), 1), cts:word-query("to", ("case-insensitive", "punctuation-insensitive", "lang=en"), 1), cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", "lang=en"), 1)), ()) -Will -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky Sent: Thursday, January 26, 2012 5:35 PM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] en/em dashes punctuation? Hi Will, One thing you can do is change your search grammar to use a joiner other than the negative sign. Here is the default grammar: http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/ 5.0doc/xml/search-dev-guide/search-api.xml%2344520 -Danny -----Original Message----- From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Will Thompson Sent: Thursday, January 26, 2012 4:34 PM To: General MarkLogic Developer Discussion Subject: [MarkLogic Dev General] en/em dashes punctuation? Our search autocomplete pulls from doc titles, some of which contain en or em dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to Transfer" - search:parse parses it into the query, even though <term-option>punctuation-insensitive</term-option> is included in the <term> section of the search options node. I thought it may just be getting ignored when it's evaluated but it's definitely limiting the query. I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar--")[. instance of cts:punctuation] => "- - - --" But is there an exception here (the same way hyphens are always parsed to negate)? Do I just need to remove these from the query string before calling search:parse? If there is a cleaner way, that would be great. Best, Will _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general