I am working on an ANTLR grammar to support the ISO Standard Z notation (specification language). The Z character set includes many non-ASCII characters, so the lexer must recognize unicode character sequences, which, for lexer token definitions comprising 4-hex escaped unicode (\uxxxx), I believe ANTRL works fine.
I have encountered a problem when attempting to recognize two required Standard Z symbols which are "above" the four-hex set recognized by my generated lexer. The two symbols are \u1D538 and \u1D53D. A review of the UCS documentation http://unicode.org/Public/UNIDATA/UnicodeData.txt indicates that indeed there is a 5-th hex digit that is used "publically", albeit infrequently - primarily for mathematics, musical symbols and other areas. Not sure many folks are writing grammars requiring recognition of such character sets. Interestingly, the 5-th hex digit only needs to reach E as the highest UCS symbol that might be used publically appears to currently be \uE01EF. Above F0000 appears to be for private use only. Looking at the ANTLRv3.g grammar within the ESC fragment definition, I believe that the four-hex unicode definition is defined: see line 495 'u' XDIGIT XDIGIT XDIGIT XDIGIT Is the solution to include a fifth digit to be recognized optionally? Could I simply replace line 495 (as below) and add a new fragment 'u' ZDIGIT? XDIGIT XDIGIT XDIGIT XDIGIT fragment ZDIGIT : '0' .. '9' | 'a' .. 'e' | 'A' .. 'E' ; Are there other implementation considerations I have overlooked? Is the limited use of this too restricted to be considered / reported as an actual ANTLR bug? Hence, should I build my own customized ANTLR? Thank-you for considering this. Kieran Beltran --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---
List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
