Heiko, I am not claiming to be the best expert on grammars and ANTLR. It is 
easy to define two lexer rules for different string literals as you point out. 
However in my opinion AntLR will only extract a LIKE_LITERAL if the String 
starts with a \% or \_ and in all other cases extract a STRING_LIT and fail 
with an error if it later finds an escaped LIKE character in the string. 

There is no way to detect a LIKE_LITERAL for any fixed number of lookahead 
characters. Please note that AntLR does not do backtracking in case of later 
errors. You can switch on an option backtracking=true but this would be very 
inefficient.

There are also fancy features like semantic predicates and other things to deal 
with ambiguous grammars. However I ran into lots of ugly issues when trying 
those for text search. A typical symptom is that it generates Java code that 
fails to compile with messages like "method too large". I ended up turning off 
all these features and looking for another solution.

Unless somebody has a grammar patch that is simple and works I still tend to 
stick to my previous solution.

Jens


-----Original Message-----
From: Kiessling, Heiko [mailto:[email protected]] 
Sent: Freitag, 23. September 2011 09:52
To: [email protected]
Subject: Re: Issues with 
org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc

Hi, Jens,

thanks for getting back on this.

If the lexer generator you use has no means to exclude certain character 
sequences, and in effect to accept certain literals for which then the parser 
can check context, I see no other option. To be more precise: would it be 
possible for the lexer generator to exclude "\%" and "\_" from normal string 
literals and have the lexer accept a LIKE_LIT otherwise. Of course, then the 
production rule for LIKE in the parser has to accept both STRING_LIT and 
LIKE_LIT on the right side.

The lexer generator I used a few years ago (REX) could do this, but if it's not 
possible here, I +1 for your proposal.

Cheers
Heiko

Am 22.09.2011 um 21:49 schrieb "Jens Hübel" <[email protected]>:

> Hi Heiko,
> 
> again sorry for the long delay in my reply. After a lot of travel I am now 
> able to look into this issue. You are absolutely right. Backslash escaping 
> for underscore and percent characters is not support at the moment for LIKE 
> and this is not what the spec says.
> 
> It is no problem to extend the grammar to support this. However this has a 
> certain impact. On the lexical level we only can have one kind of string 
> literal and there is no context whether we are in a LIKE expression or 
> anywhere else. This means backslash escaping for percent and underscore is 
> then allowed for any kind of string literal. Throwing an exception in all 
> other cases where we are not in a LIKE expression is then part of the user 
> code and not the parser framework. The best we can do is provide helper 
> functions for unescaping to make this a bit easier.
> 
> If everyone is fine with this approach I will change the lexer grammar.
> 
> Jens
> 
> 
> -----Original Message-----
> From: Kiessling, Heiko [mailto:[email protected]] 
> Sent: Montag, 12. September 2011 16:45
> To: [email protected]
> Subject: RE: Issues with 
> org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc
> 
> Hi, Jens,
> 
> thanks for your quick reply. I got me the snapshot 
> 'chemistry-opencmis-server-support-0.5.0-20110911.030458-142.jar'
> in the meantime but this has still the problem with the eascaping mechanism. 
> The WHERE clause I try is
> 'WHERE cmis:name LIKE 'Do\\%ent''.
> 
> Thanks and best regards
> Heiko
> 
> ----------------
> You wrote:
> 
> Hi Heiko,
> 
> are you using the latest snapshot from SVN? Since the last release there are 
> several fixes
> and enhancements to the escaping mechanism. Please use the latest version 
> from the trunk if
> you don't have it and let me know if this still does not work as expected. (A 
> new release
> will be available soon).
> 
> There is no kind of semantic analysis in the framework. It is just the parser 
> and any error
> handling except basic syntax errors is up to you.
> 
> Hope this helps....
> 
> Jens
> 
> -----Original Message-----
> From: Kiessling, Heiko [mailto:[email protected]]
> Sent: Mittwoch, 7. September 2011 18:23
> To: [email protected]
> Subject: Issues with 
> org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc
> 
> Hi,
> 
> in the cause of implementing CMIS queries I have found the following problems 
> with the above
> method:
> -       The parser does not accept escaping backslashes in LIKE strings. For 
> example, the
> string 'pa\%ern' which according to the CMIS spec is supposed to look for the 
> value 'pa%ern'
> is acknowledged with the two messages "mismatched character '%' expecting set 
> null" and "mismatched
> character '<EOF>' expecting '''" and a CmisInvalidArgumentException. Sounds 
> like a lexical
> analysis problem to me.
> -       Is there semantic analysis built in? For example, the = ANY operator 
> is not possible
> for single-valued properties, and, vice versa the simple = operator is not 
> allowed for multi-valued
> properties. However, no error is announced when parsing this kind of 
> statement.
> 
> Would be great if you could tell us whether these are known limitations at 
> this time but are
> worked on, or whether we're making any mistakes.
> 
> Thanks and best regards
> 
> Heiko Kiessling
> Senior Developer
> TIP CORE Conn., Security, Integr. (AG)
> SAP AG | Dietmar-Hopp-Allee 16 | 69190 Walldorf, Germany
> T + 49 6227 745434 | F + 49 6227 7822615
> E [email protected]<mailto:[email protected]> | 
> www.sap.com<http://www.sap.com/>
> 
> Pflichtangaben/Mandatory Disclosure Statements:
> http://www.sap.com/company/legal/impressum.epx
> 
> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige 
> vertrauliche Informationen
> enthalten.
> Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine 
> Kenntnisnahme des Inhalts,
> eine Vervielfältigung
> oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie 
> uns und vernichten
> Sie die
> empfangene E-Mail. Vielen Dank.
> 
> This e-mail may contain trade secrets or privileged, undisclosed, or 
> otherwise confidential
> information. If you have
> received this e-mail in error, you are hereby notified that any review, 
> copying, or distribution
> of it is strictly prohibited.
> Please inform us immediately and destroy the original transmittal. Thank you 
> for your cooperation.
> 
> 

Reply via email to