Ha Luong wrote:
> Dear all,
>
> I tried to use the grammar for accepting the unicode string as follow:
> //modify T.g in the example source of ANTLR book
> grammar T;
> options {
> language=Java;
> }
> @members {
> String s;
> }
> r : ID '#' {s = $ID.text; System.out.println("found "+s);} ;
> ID: ('a'..'z'|'\u00e0')+ ; //\u00e0
> WS: (' '|'\n'|'\r')+ {skip();} ; // ignore whitespace
>
> and do these commands in cygwin:
> java org.antlr.Tool T.g
> javac *.java
>
> If I test the literal 'a', it is ok
> java Test
> a #
> ^Z
> found a
>
> but the literal 'à', it has error:
> java Test
> à
> #
> ^Z
> line 1:0 no viable alternative at character 'à'
> line 2:0 missing ID at '#'
> found <missing ID>
The question that immediately occurs is whether your 'à' is actually a
00e0, or is it an 0300+0061? Not sure whether this would make a
difference (the standard seems a little foggy as to whether
implementations should consider them identical), but it's a question
that comes immediately to mind.
Sam
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---