Yeah, unfortunately it will be difficult to make users aware of this. It seems
like the best workaround for now is to regex the querystring before parsing,
and convert any tokens we can detect as numbers to a phrase. Then the parser
leaves the parens alone.
replace($qs,
'(^|\s)(\d{1,4}[a-z]?(\.\d{1,4})?(\(\d{1,2}\)))(\s|$)',
'$1"$2"$5')
I usually try to avoid string manipulation before parsing because unexpected
input can cause things to blow up, but this seems pretty safe.
-Will
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Danny Sokolsky
Sent: Friday, August 10, 2012 7:06 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping
It probably won't work for you, but one idea is to change the starter for
grouping to have different delimiting chars. For example, 2 parens:
<starter strength="30" apply="grouping" delimiter="))">((</starter>
It might be better than a space....
-Danny
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Friday, August 10, 2012 4:59 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping
That does seem undesirable. I was going to refer you to
https://github.com/mblakele/xqysp but it doesn't do much better - unless you
can get your users to quote the number?
import module namespace qe="com.blakeley.xqysp.query-eval"
at "query-eval.xqy";
qe:parse('123.4(5)'),
qe:parse('"123.4(5)"')
=>
cts:and-query((cts:word-query("123", ("lang=en"), 1), cts:word-query("4",
("lang=en"), 1), cts:word-query("5", ("lang=en"), 1)), ())
cts:word-query("123.4(5)", ("lang=en"), 1)
You can pass that output to search:resolve(), with pretty much the same
semantics as search:search.
-- Mike
On 10 Aug 2012, at 16:15 , Will Thompson wrote:
> I need to prevent paren grouping from happening when the parens are part of a
> string - typically it's a reference-type number. I can't think of a situation
> where this would be desirable anyway:
>
> search:parse('123.4(5)')
> => cts:and-query((cts:word-query("123.4(5"), cts:word-query(")")))
>
> If I change the grammar to require a space on either or both sides of the
> paren, then it will always break some legitimate grouping case like "(hello
> AND world)".
>
> Is there any way to control these grammar options a little further? It would
> be easy if you could just use regexes in the grammar options, i.e.:
>
> <starter strength="30" apply="grouping"
> delimiter="(^|\s)/)">/(($|\s)</starter>
>
> Thanks,
>
> Will
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general