Mike's note about quoting the string to avoid the problem is true for the Search API parser as well. If you find that the regex has undesirable side effects or doesn't perform as well as you hope, then that's an avenue to explore.
________________________________________ From: [email protected] [[email protected]] On Behalf Of Will Thompson [[email protected]] Sent: Saturday, August 11, 2012 12:35 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping Yeah, unfortunately it will be difficult to make users aware of this. It seems like the best workaround for now is to regex the querystring before parsing, and convert any tokens we can detect as numbers to a phrase. Then the parser leaves the parens alone. replace($qs, '(^|\s)(\d{1,4}[a-z]?(\.\d{1,4})?(\(\d{1,2}\)))(\s|$)', '$1"$2"$5') I usually try to avoid string manipulation before parsing because unexpected input can cause things to blow up, but this seems pretty safe. -Will -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Danny Sokolsky Sent: Friday, August 10, 2012 7:06 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping It probably won't work for you, but one idea is to change the starter for grouping to have different delimiting chars. For example, 2 parens: <starter strength="30" apply="grouping" delimiter="))">((</starter> It might be better than a space.... -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Friday, August 10, 2012 4:59 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] search:parse parenthetical grouping That does seem undesirable. I was going to refer you to https://github.com/mblakele/xqysp but it doesn't do much better - unless you can get your users to quote the number? import module namespace qe="com.blakeley.xqysp.query-eval" at "query-eval.xqy"; qe:parse('123.4(5)'), qe:parse('"123.4(5)"') => cts:and-query((cts:word-query("123", ("lang=en"), 1), cts:word-query("4", ("lang=en"), 1), cts:word-query("5", ("lang=en"), 1)), ()) cts:word-query("123.4(5)", ("lang=en"), 1) You can pass that output to search:resolve(), with pretty much the same semantics as search:search. -- Mike On 10 Aug 2012, at 16:15 , Will Thompson wrote: > I need to prevent paren grouping from happening when the parens are part of a > string - typically it's a reference-type number. I can't think of a situation > where this would be desirable anyway: > > search:parse('123.4(5)') > => cts:and-query((cts:word-query("123.4(5"), cts:word-query(")"))) > > If I change the grammar to require a space on either or both sides of the > paren, then it will always break some legitimate grouping case like "(hello > AND world)". > > Is there any way to control these grammar options a little further? It would > be easy if you could just use regexes in the grammar options, i.e.: > > <starter strength="30" apply="grouping" > delimiter="(^|\s)/)">/(($|\s)</starter> > > Thanks, > > Will > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
