[il-antlr-interest: 24194] Re: [antlr-interest] About literal supports unicode

Sam Barnett-Cormack Sat, 13 Jun 2009 08:22:24 -0700

Ha Luong wrote:
> Dear all,
> 
> I tried to use the grammar for accepting the unicode string as follow:
> //modify T.g in the example source of ANTLR book
> grammar T;
> options {
>     language=Java;
> }
> @members {
> String s;
> }
> r : ID '#' {s = $ID.text; System.out.println("found "+s);} ;
> ID: ('a'..'z'|'\u00e0')+ ; //\u00e0
> WS: (' '|'\n'|'\r')+ {skip();} ; // ignore whitespace
> 
> and do these commands in cygwin:
> java org.antlr.Tool T.g
> javac *.java
> 
> If I test the literal 'a', it is ok
> java Test
> a #
> ^Z
> found a
> 
> but the literal 'à', it has error:
> java Test
> à
> #
> ^Z
> line 1:0 no viable alternative at character 'à'
> line 2:0 missing ID at '#'
> found <missing ID>


The question that immediately occurs is whether your 'à' is actually a 
00e0, or is it an 0300+0061? Not sure whether this would make a 
difference (the standard seems a little foggy as to whether 
implementations should consider them identical), but it's a question 
that comes immediately to mind.

Sam

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

[il-antlr-interest: 24194] Re: [antlr-interest] About literal supports unicode

Reply via email to