Hi, Jens, as I said, going with your original solution is also okay with me.
Best Heiko Am 23.09.2011 um 22:53 schrieb "Jens Hübel" <[email protected]>: > Heiko, I am not claiming to be the best expert on grammars and ANTLR. It is > easy to define two lexer rules for different string literals as you point > out. However in my opinion AntLR will only extract a LIKE_LITERAL if the > String starts with a \% or \_ and in all other cases extract a STRING_LIT and > fail with an error if it later finds an escaped LIKE character in the string. > > There is no way to detect a LIKE_LITERAL for any fixed number of lookahead > characters. Please note that AntLR does not do backtracking in case of later > errors. You can switch on an option backtracking=true but this would be very > inefficient. > > There are also fancy features like semantic predicates and other things to > deal with ambiguous grammars. However I ran into lots of ugly issues when > trying those for text search. A typical symptom is that it generates Java > code that fails to compile with messages like "method too large". I ended up > turning off all these features and looking for another solution. > > Unless somebody has a grammar patch that is simple and works I still tend to > stick to my previous solution. > > Jens > > > -----Original Message----- > From: Kiessling, Heiko [mailto:[email protected]] > Sent: Freitag, 23. September 2011 09:52 > To: [email protected] > Subject: Re: Issues with > org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc > > Hi, Jens, > > thanks for getting back on this. > > If the lexer generator you use has no means to exclude certain character > sequences, and in effect to accept certain literals for which then the parser > can check context, I see no other option. To be more precise: would it be > possible for the lexer generator to exclude "\%" and "\_" from normal string > literals and have the lexer accept a LIKE_LIT otherwise. Of course, then the > production rule for LIKE in the parser has to accept both STRING_LIT and > LIKE_LIT on the right side. > > The lexer generator I used a few years ago (REX) could do this, but if it's > not possible here, I +1 for your proposal. > > Cheers > Heiko > > Am 22.09.2011 um 21:49 schrieb "Jens Hübel" <[email protected]>: > >> Hi Heiko, >> >> again sorry for the long delay in my reply. After a lot of travel I am now >> able to look into this issue. You are absolutely right. Backslash escaping >> for underscore and percent characters is not support at the moment for LIKE >> and this is not what the spec says. >> >> It is no problem to extend the grammar to support this. However this has a >> certain impact. On the lexical level we only can have one kind of string >> literal and there is no context whether we are in a LIKE expression or >> anywhere else. This means backslash escaping for percent and underscore is >> then allowed for any kind of string literal. Throwing an exception in all >> other cases where we are not in a LIKE expression is then part of the user >> code and not the parser framework. The best we can do is provide helper >> functions for unescaping to make this a bit easier. >> >> If everyone is fine with this approach I will change the lexer grammar. >> >> Jens >> >> >> -----Original Message----- >> From: Kiessling, Heiko [mailto:[email protected]] >> Sent: Montag, 12. September 2011 16:45 >> To: [email protected] >> Subject: RE: Issues with >> org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc >> >> Hi, Jens, >> >> thanks for your quick reply. I got me the snapshot >> 'chemistry-opencmis-server-support-0.5.0-20110911.030458-142.jar' >> in the meantime but this has still the problem with the eascaping mechanism. >> The WHERE clause I try is >> 'WHERE cmis:name LIKE 'Do\\%ent''. >> >> Thanks and best regards >> Heiko >> >> ---------------- >> You wrote: >> >> Hi Heiko, >> >> are you using the latest snapshot from SVN? Since the last release there are >> several fixes >> and enhancements to the escaping mechanism. Please use the latest version >> from the trunk if >> you don't have it and let me know if this still does not work as expected. >> (A new release >> will be available soon). >> >> There is no kind of semantic analysis in the framework. It is just the >> parser and any error >> handling except basic syntax errors is up to you. >> >> Hope this helps.... >> >> Jens >> >> -----Original Message----- >> From: Kiessling, Heiko [mailto:[email protected]] >> Sent: Mittwoch, 7. September 2011 18:23 >> To: [email protected] >> Subject: Issues with >> org.apache.chemistry.opencmis.server.support.query.QueryUtil.traverseStatementAndCatchExc >> >> Hi, >> >> in the cause of implementing CMIS queries I have found the following >> problems with the above >> method: >> - The parser does not accept escaping backslashes in LIKE strings. For >> example, the >> string 'pa\%ern' which according to the CMIS spec is supposed to look for >> the value 'pa%ern' >> is acknowledged with the two messages "mismatched character '%' expecting >> set null" and "mismatched >> character '<EOF>' expecting '''" and a CmisInvalidArgumentException. Sounds >> like a lexical >> analysis problem to me. >> - Is there semantic analysis built in? For example, the = ANY operator >> is not possible >> for single-valued properties, and, vice versa the simple = operator is not >> allowed for multi-valued >> properties. However, no error is announced when parsing this kind of >> statement. >> >> Would be great if you could tell us whether these are known limitations at >> this time but are >> worked on, or whether we're making any mistakes. >> >> Thanks and best regards >> >> Heiko Kiessling >> Senior Developer >> TIP CORE Conn., Security, Integr. (AG) >> SAP AG | Dietmar-Hopp-Allee 16 | 69190 Walldorf, Germany >> T + 49 6227 745434 | F + 49 6227 7822615 >> E [email protected]<mailto:[email protected]> | >> www.sap.com<http://www.sap.com/> >> >> Pflichtangaben/Mandatory Disclosure Statements: >> http://www.sap.com/company/legal/impressum.epx >> >> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige >> vertrauliche Informationen >> enthalten. >> Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine >> Kenntnisnahme des Inhalts, >> eine Vervielfältigung >> oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie >> uns und vernichten >> Sie die >> empfangene E-Mail. Vielen Dank. >> >> This e-mail may contain trade secrets or privileged, undisclosed, or >> otherwise confidential >> information. If you have >> received this e-mail in error, you are hereby notified that any review, >> copying, or distribution >> of it is strictly prohibited. >> Please inform us immediately and destroy the original transmittal. Thank you >> for your cooperation. >> >> >
