I guess the behavior we want is contrary to what most people expect from an
autocomplete submission, which I probably should have stated first. There are
many overlapping topics in our content, so we still want a user to see docs
with similar keywords, even if they decide they like a specific title enough to
autocomplete on it. So I want it to do the and-query but exclude the dash. But
I'm confused why the dash would be included in the first place, based on the
options I give to parse.
For search:parse("Venue ― Motion to Transfer",
<options xmlns="http://marklogic.com/appservices/search">
<term>
<term-option>case-insensitive</term-option>
<term-option>punctuation-insensitive</term-option
</term>
</options)
I would expect:
cts:and-query(
(cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("to", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1)))
But it includes the dash, and evidently "punctuation-insensitive" does not
render this null when it searches either:
cts:word-query("―", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1)
I'm guessing that it doesn't consider it punctuation if it's surrounded by
whitespace.
-Will
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Thursday, January 26, 2012 6:13 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?
Sorry Will, I misunderstood (thought you meant the - was being treated as
negation).
Since you are pulling those from data that you know is in your database, how
about if you make the whole thing a phrase put surrounding it with quotes.
Here is an example of what I mean:
xquery version "1.0-ml";
import module namespace search =
"http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:parse('"Venue ― Motion to Transfer"')
-Danny
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, January 26, 2012 6:01 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?
Thanks Danny, but I'm not sure I follow. Maybe that was not the best
explanation. Rather than use dashes like hyphens, I just want a search for
something like "Venue ― Motion to Transfer" to ignore the dash when
parsed. It appears to be treating it like a word instead and is not ignored:
cts:and-query(
(cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("―", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("to", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1),
cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive",
"lang=en"), 1)),
())
-Will
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Thursday, January 26, 2012 5:35 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?
Hi Will,
One thing you can do is change your search grammar to use a joiner other than
the negative sign.
Here is the default grammar:
http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520
-Danny
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, January 26, 2012 4:34 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] en/em dashes punctuation?
Our search autocomplete pulls from doc titles, some of which contain en or em
dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to Transfer"
- search:parse parses it into the query, even though
<term-option>punctuation-insensitive</term-option> is included in the <term>
section of the search options node. I thought it may just be getting ignored
when it's evaluated but it's definitely limiting the query.
I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[.
instance of cts:punctuation] => "- - - ―"
But is there an exception here (the same way hyphens are always parsed to
negate)? Do I just need to remove these from the query string before calling
search:parse? If there is a cleaner way, that would be great.
Best,
Will
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general