Mike's note about quoting the string to avoid the problem is true for the 
Search API parser as well.  If you find that the regex has undesirable side 
effects or doesn't perform as well as you hope, then that's an avenue to 
explore.

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Will Thompson 
[[email protected]]
Sent: Saturday, August 11, 2012 12:35 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping

Yeah, unfortunately it will be difficult to make users aware of this. It seems 
like the best workaround for now is to regex the querystring before parsing, 
and convert any tokens we can detect as numbers to a phrase. Then the parser 
leaves the parens alone.

replace($qs,
  '(^|\s)(\d{1,4}[a-z]?(\.\d{1,4})?(\(\d{1,2}\)))(\s|$)',
  '$1"$2"$5')

I usually try to avoid string manipulation before parsing because unexpected 
input can cause things to blow up, but this seems pretty safe.

-Will

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Friday, August 10, 2012 7:06 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping

It probably won't work for you, but one idea is to change the starter for 
grouping to have different delimiting chars.  For example, 2 parens:

<starter strength="30" apply="grouping" delimiter="))">((</starter>

It might be better than a space....

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Friday, August 10, 2012 4:59 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping

That does seem undesirable. I was going to refer you to 
https://github.com/mblakele/xqysp but it doesn't do much better - unless you 
can get your users to quote the number?

import module namespace qe="com.blakeley.xqysp.query-eval"
 at "query-eval.xqy";

qe:parse('123.4(5)'),
qe:parse('"123.4(5)"')
=>
cts:and-query((cts:word-query("123", ("lang=en"), 1), cts:word-query("4", 
("lang=en"), 1), cts:word-query("5", ("lang=en"), 1)), ()) 
cts:word-query("123.4(5)", ("lang=en"), 1)

You can pass that output to search:resolve(), with pretty much the same 
semantics as search:search.

-- Mike

On 10 Aug 2012, at 16:15 , Will Thompson wrote:

> I need to prevent paren grouping from happening when the parens are part of a 
> string - typically it's a reference-type number. I can't think of a situation 
> where this would be desirable anyway:
>
> search:parse('123.4(5)')
> => cts:and-query((cts:word-query("123.4(5"), cts:word-query(")")))
>
> If I change the grammar to require a space on either or both sides of the 
> paren, then it will always break some legitimate grouping case like "(hello 
> AND world)".
>
> Is there any way to control these grammar options a little further? It would 
> be easy if you could just use regexes in the grammar options, i.e.:
>
> <starter strength="30" apply="grouping"
> delimiter="(^|\s)/)">/(($|\s)</starter>
>
> Thanks,
>
> Will
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to