I guess the behavior we want is contrary to what most people expect from an 
autocomplete submission, which I probably should have stated first. There are 
many overlapping topics in our content, so we still want a user to see docs 
with similar keywords, even if they decide they like a specific title enough to 
autocomplete on it. So I want it to do the and-query but exclude the dash. But 
I'm confused why the dash would be included in the first place, based on the 
options I give to parse. 

For  search:parse("Venue ― Motion to Transfer", 
<options xmlns="http://marklogic.com/appservices/search";> 
  <term> 
    <term-option>case-insensitive</term-option> 
    <term-option>punctuation-insensitive</term-option
  </term>
</options)

I would expect:

cts:and-query(
  (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("to", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1)))

But it includes the dash, and evidently "punctuation-insensitive" does not 
render this null when it searches either:
cts:word-query("&#x2015;", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1)

I'm guessing that it doesn't consider it punctuation if it's surrounded by 
whitespace.

-Will

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Thursday, January 26, 2012 6:13 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?

Sorry Will, I misunderstood (thought you meant the - was being treated as 
negation).

Since you are pulling those from data that you know is in your database, how 
about if you make the whole thing a phrase put surrounding it with quotes.  
Here is an example of what I mean:

xquery version "1.0-ml";

import module namespace search = 
  "http://marklogic.com/appservices/search";
  at "/MarkLogic/appservices/search/search.xqy";

search:parse('"Venue &#x2015; Motion to Transfer"')

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, January 26, 2012 6:01 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?

Thanks Danny, but I'm not sure I follow. Maybe that was not the best 
explanation. Rather than use dashes like hyphens, I just want a search for 
something like "Venue &#x2015; Motion to Transfer" to ignore the dash when 
parsed. It appears to be treating it like a word instead and is not ignored:

cts:and-query(
  (cts:word-query("Venue", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("&#x2015;", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Motion", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("to", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1),
   cts:word-query("Transfer", ("case-insensitive", "punctuation-insensitive", 
"lang=en"), 1)),
  ())

-Will

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Thursday, January 26, 2012 5:35 PM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] en/em dashes punctuation?

Hi Will,

One thing you can do is change your search grammar to use a joiner other than 
the negative sign.

Here is the default grammar:

http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/xml/search-dev-guide/search-api.xml%2344520

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Will Thompson
Sent: Thursday, January 26, 2012 4:34 PM
To: General MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] en/em dashes punctuation?

Our search autocomplete pulls from doc titles, some of which contain en or em 
dashes. However, if the dash is "floating"- i.e.: "Venue - Motion to Transfer" 
- search:parse parses it into the query, even though 
<term-option>punctuation-insensitive</term-option> is included in the <term> 
section of the search options node. I thought it may just be getting ignored 
when it's evaluated but it's definitely limiting the query.

I can confirm they are punctuation: cts:tokenize("hyphen-en-em-bar―")[. 
instance of cts:punctuation] => "- - - ―"

But is there an exception here (the same way hyphens are always parsed to 
negate)? Do I just need to remove these from the query string before calling 
search:parse? If there is a cleaner way, that would be great.


Best,

Will
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to