Re: [antlr-dev] UNICODE file input for C Runtime

Goins, John C (IS) Thu, 18 Mar 2010 13:49:29 -0700

Jim -


Thanks, I've integrated this release and used it successfully with UTF
16 and ASCII (8 bit) files so far in limited testing.  However, I'm
having problems with UTF8.  I tracked the problem down to the function
antlr3StringFactoryNew() inside antlr2string.c.  The case statement only
sets the API for UTF16 and 8BIT.  I can make some more API functions for
the rest, if that's all that's missing.  I suspect you may have already
done so, though.  I believe the case statement will need to be filled
for all the various types before releasing this version, unless I am
missing something.  An error occurs in antlr3filestream.c line 81 when
loading UTF8 files because the newStr8 function is null for the input
stream.

 

John  

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Jim Idle
Sent: Monday, February 22, 2010 4:18 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime

 

I think you mean 'standard Unicode encodings' rather than Unicode ;-)

 

This is built in to the next release, I just have not had time to get to
doing the actual release. You can get the new sources from 

 

http://fisheye2.atlassian.com/browse/antlr

 

though they are not well tested as of yet. You can also get a perforce
login from Terence, or use the git mirror at: http://github.com/antlr
You will need to read through the new source to use it as I have not had
time to update the docs yet either.

 

 

Jim

 

From: Goins, John C (IS) [mailto:[email protected]] 
Sent: Monday, February 22, 2010 12:40 PM
To: Jim Idle; [email protected]
Subject: RE: [antlr-dev] UNICODE file input for C Runtime

 

I was wondering if there were source code or a C-Runtime update
available yet that handled loading UNICODE files in the C-Runtime?  If,
so, where can I grab them from. Is the next release a 3.x version or
will it be 4.x?  TIA

 

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Jim Idle
Sent: Wednesday, January 06, 2010 7:43 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime

 

You should find sample C code by searching antlr.markmail.org

 

However if you can wait a few weeks then the next release will support a
universal input stream that processes BOM and supports UTF8, UTF16,
UTF32, ASCII/8bit and EBCDIC.

 

Jim

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Goins, John C (IS)
Sent: Wednesday, January 06, 2010 3:38 PM
To: [email protected]
Subject: [antlr-dev] UNICODE file input for C Runtime

 

I've found ANTLR very useful as a language parser for my application,
but I now have a requirement to use UNICODE files as input.  I'm using
the C runtime since my application is written in C. I hope someone can
help me with a couple of questions.

There are two bytes at the beginning of a UNICODE file. My application
will be run on multiple platforms (Java wasn't an option) and I will
need to interpret the UNICODE BOM (byte order mark) since I don't think
ANTLR uses this, is that correct?  I can write a function to always set
the order to one particular way (the input files could come from
different architecture machines) by reading the BOM myself. I think that
is a correct approach, unless there is something in the ANTLR C Runtime
that can help.

I've read about how I need to convert a UNICODE file to UTF-32 and use
the UCS2 input functions, but I've had little to no success in doing so.
I get lots of errors or things just don't parse. Does anyone have sample
C code that accomplishes this? Or even the functions that I should use
and order in which to call them?

TIA

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] UNICODE file input for C Runtime

Reply via email to