I think you mean 'standard Unicode encodings' rather than Unicode ;-) This is built in to the next release, I just have not had time to get to doing the actual release. You can get the new sources from
http://fisheye2.atlassian.com/browse/antlr though they are not well tested as of yet. You can also get a perforce login from Terence, or use the git mirror at: http://github.com/antlr You will need to read through the new source to use it as I have not had time to update the docs yet either. Jim From: Goins, John C (IS) [mailto:[email protected]] Sent: Monday, February 22, 2010 12:40 PM To: Jim Idle; [email protected] Subject: RE: [antlr-dev] UNICODE file input for C Runtime I was wondering if there were source code or a C-Runtime update available yet that handled loading UNICODE files in the C-Runtime? If, so, where can I grab them from. Is the next release a 3.x version or will it be 4.x? TIA From: [email protected] [mailto:[email protected]] On Behalf Of Jim Idle Sent: Wednesday, January 06, 2010 7:43 PM To: [email protected] Subject: Re: [antlr-dev] UNICODE file input for C Runtime You should find sample C code by searching antlr.markmail.org However if you can wait a few weeks then the next release will support a universal input stream that processes BOM and supports UTF8, UTF16, UTF32, ASCII/8bit and EBCDIC. Jim From: [email protected] [mailto:[email protected]] On Behalf Of Goins, John C (IS) Sent: Wednesday, January 06, 2010 3:38 PM To: [email protected] Subject: [antlr-dev] UNICODE file input for C Runtime I've found ANTLR very useful as a language parser for my application, but I now have a requirement to use UNICODE files as input. I'm using the C runtime since my application is written in C. I hope someone can help me with a couple of questions. There are two bytes at the beginning of a UNICODE file. My application will be run on multiple platforms (Java wasn't an option) and I will need to interpret the UNICODE BOM (byte order mark) since I don't think ANTLR uses this, is that correct? I can write a function to always set the order to one particular way (the input files could come from different architecture machines) by reading the BOM myself. I think that is a correct approach, unless there is something in the ANTLR C Runtime that can help. I've read about how I need to convert a UNICODE file to UTF-32 and use the UCS2 input functions, but I've had little to no success in doing so. I get lots of errors or things just don't parse. Does anyone have sample C code that accomplishes this? Or even the functions that I should use and order in which to call them? TIA
_______________________________________________ antlr-dev mailing list [email protected] http://www.antlr.org/mailman/listinfo/antlr-dev
