A lot has been said already in this thread, but what strikes me is why the
string is tokenized in the first place. Why not just sent through as a
single phrase to a single cts:word-query? Then it would just be ignored,
without negative side-effects..

Grtz

-----Oorspronkelijk bericht-----
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens Will Thompson
Verzonden: vrijdag 27 januari 2012 3:01
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] en/em dashes punctuation?

Thanks Danny, but I'm not sure I follow. Maybe that was not the best
explanation. Rather than use dashes like hyphens, I just want a search for
something like "Venue ― Motion to Transfer" to ignore the dash when
parsed. It appears to be treating it like a word instead and is not
ignored:

cts:and-query(
  (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
   cts:word-query("―", ("case-insensitive",
"punctuation-insensitive", "lang=en"), 1),
   cts:word-query("Motion", ("case-insensitive",
"punctuation-insensitive", "lang=en"), 1),
   cts:word-query("to", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
   cts:word-query("Transfer", ("case-insensitive",
"punctuation-insensitive", "lang=en"), 1)),
  ())

-Will

-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny
Sokolsky
Sent: Thursday, January 26, 2012 5:35 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?

Hi Will,

One thing you can do is change your search grammar to use a joiner other
than the negative sign.

Here is the default grammar:

http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/
5.0doc/xml/search-dev-guide/search-api.xml%2344520

-Danny

-----Original Message-----
From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Will
Thompson
Sent: Thursday, January 26, 2012 4:34 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] en/em dashes punctuation?

Our search autocomplete pulls from doc titles, some of which contain en or
em dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to
Transfer" - search:parse parses it into the query, even though
<term-option>punctuation-insensitive</term-option> is included in the
<term> section of the search options node. I thought it may just be
getting ignored when it's evaluated but it's definitely limiting the
query.

I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar--")[.
instance of cts:punctuation] => "- - - --"

But is there an exception here (the same way hyphens are always parsed to
negate)? Do I just need to remove these from the query string before
calling search:parse? If there is a cleaner way, that would be great.


Best,

Will
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to