Note though that SELECT * FROM cmis:document WHERE CONTAINS
('\u4E2D\u6587') isn't actually legal CMISQL, as currently CMISQL has
no notion of Unicode escaping. The query would have to contain actual
Unicode characters.

But doesn't this query contain actual Unicode characters? \u4E2D and \u6587 are Java Unicode Escapes [1].

Michael
[1] http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#100850

NB: Unicode escaping is only specified in SQL-2008, not SQL-92. See
this for a summary:
http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#N11E65

Florent

On Thu, Mar 31, 2011 at 2:00 PM, Florent Guillaume<[email protected]>  wrote:
No objection, I probably wasn't aware of ANTLRStringStream when I
wrote that code.

Florent

On Thu, Mar 31, 2011 at 12:47 PM, Jens Hübel<[email protected]>  wrote:
Florent,

as far as I remember this code came originally from your side. Would you have 
any objections to apply the proposed patch? Would this break something on your 
side?

Jens



-----Original Message-----
From: Jens Hübel (JIRA) [mailto:[email protected]]
Sent: Donnerstag, 31. März 2011 12:42
To: [email protected]
Subject: [jira] [Assigned] (CMIS-344) Query parser should not use UTF-8 encoding


     [ 
https://issues.apache.org/jira/browse/CMIS-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jens Hübel reassigned CMIS-344:
-------------------------------

    Assignee: Jens Hübel

Query parser should not use UTF-8 encoding
------------------------------------------

                 Key: CMIS-344
                 URL: https://issues.apache.org/jira/browse/CMIS-344
             Project: Chemistry
          Issue Type: Bug
          Components: opencmis-server
    Affects Versions: OpenCMIS 0.4.0
            Reporter: Michael Dürig
            Assignee: Jens Hübel
         Attachments: CMIS-344.patch


QueryUtil converts the query statement to a UTF-8 encoded byte array which is 
used as input to the lexer instead of using the string directly.
Instead of
     CharStream input = new ANTLRInputStream(new 
ByteArrayInputStream(statement.getBytes("UTF-8")));
the input stream should be obtained like this:
     CharStream input = new ANTLRStringStream(statement);
The former method transforms the characters in the contains clause of the query
     SELECT * FROM cmis:document WHERE CONTAINS ('\u4E2D\u6587')
in an incorrect way.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87





Reply via email to