Re: [MarkLogic Dev General] XQYP: Issue with NEAR parsing

Michael Blakeley Sat, 27 Oct 2012 10:55:09 -0700

Unlike the search:parse parser, xqysp has a fixed grammar. The grammar isn't 
designed to be user-configurable, just generally useful. It treats NEAR as an 
infix op, not a prefix op. It treats most punctuation like whitespace, as a 
token boundary.

The Apache license permits you to fork https://github.com/mblakele/xqysp and 
change its behavior to suit yourself, and that's probably easier than writing 
the parts you don't want to change. The XQuery is fairly straightforward. I 
would start by modifying the unit tests in test/xqysp.xml to expect the 
behavior you want, then enable the xqysp.xqy $DEBUG variable and start 
experimenting. You'll see quite a lot of debug-state() output in the logs.

The NEAR change would involve moving $TOK-NEAR and $TOK-ONEAR out of 
$TOKS-INFIX and into $TOKS-PREFIX. I'm not sure what other changes you'd have 
to make, but that's where testing comes in.

The punctuation change looks uglier because the behavior you want is 
state-dependent: different for NEAR than for other cases. Try writing out the 
EBNF to describe that, and you'll see why I find it unappealing. You'd probably 
have to add a third parameter to p:word, telling it whether or not to treat 
commas as token boundaries. Modify the callers to match. Then I think the 
actual comma-joining could piggyback on the existing behavior for 
$TOKS-WORD-JOIN, but slightly more complicated because you want that behavior 
to be parameterized.

-- Mike

On 27 Oct 2012, at 04:53 , Abhishek53 S <[email protected]> wrote:

> Hi All,
>  
> We are using XQYSP for search term parsing [Really extraordinary concept] 
> inside our solution. We want to use comma & whitespace (current scenario it's 
> only whitespace) both as tokenizer for creating literals node for proximity 
> search but not for other clauses.
> Eg.
>  
> Search Term: "NEAR (cat,dog)" - expected to be parsed as
>  
> <root>
> 
>        <expression type="prefix" op="NEAR/100">
> 
>             <group>
> 
>                  <literal>cat</literal>
> 
>                  <literal>dog</literal>
> 
>             </group>
> 
>        </expression>
> 
> </root>
> 
> where as terms without NEAR cluases should be parsed as
> 
> Search Term Phrase:  "cat,dog"
> <root>
> 
>      <literal>cat,dog</literal>
> 
> </root>
> 
> Any way to move ahead :)
> Thanks
> Abhishek Srivastav
> Tata Consultancy Services
> Cell:- +91-9883389968
> Mailto: [email protected]
> Website: http://www.tcs.com
> ____________________________________________
> Experience certainty. IT Services
> Business Solutions
> Outsourcing
> ____________________________________________
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain 
> confidential or privileged information. If you are 
> not the intended recipient, any dissemination, use, 
> review, distribution, printing or copying of the 
> information contained in this e-mail message 
> and/or attachments to it are strictly prohibited. If 
> you have received this communication in error, 
> please notify us by reply e-mail or telephone and 
> immediately and permanently delete the message 
> and any attachments. Thank you
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] XQYP: Issue with NEAR parsing

Reply via email to