Re: [antlr-dev] UNICODE file input for C Runtime

Jim Idle Wed, 06 Jan 2010 16:43:02 -0800

You should find sample C code by searching antlr.markmail.org

However if you can wait a few weeks then the next release will support a 
universal input stream that processes BOM and supports UTF8, UTF16, UTF32, 
ASCII/8bit and EBCDIC.

Jim

From: [email protected] [mailto:[email protected]] On 
Behalf Of Goins, John C (IS)
Sent: Wednesday, January 06, 2010 3:38 PM
To: [email protected]
Subject: [antlr-dev] UNICODE file input for C Runtime

I've found ANTLR very useful as a language parser for my application, but I now 
have a requirement to use UNICODE files as input.  I'm using the C runtime 
since my application is written in C. I hope someone can help me with a couple 
of questions.
There are two bytes at the beginning of a UNICODE file. My application will be 
run on multiple platforms (Java wasn't an option) and I will need to interpret 
the UNICODE BOM (byte order mark) since I don't think ANTLR uses this, is that 
correct?  I can write a function to always set the order to one particular way 
(the input files could come from different architecture machines) by reading 
the BOM myself. I think that is a correct approach, unless there is something 
in the ANTLR C Runtime that can help.
I've read about how I need to convert a UNICODE file to UTF-32 and use the UCS2 
input functions, but I've had little to no success in doing so.  I get lots of 
errors or things just don't parse. Does anyone have sample C code that 
accomplishes this? Or even the functions that I should use and order in which 
to call them?
TIA

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] UNICODE file input for C Runtime

Reply via email to