Re: [antlr-dev] UNICODE file input for C Runtime

Goins, John C (IS) Thu, 18 Mar 2010 14:09:45 -0700

Jim -


I'm not sure how to proceed.  Internally the string functions seem to be
used in various places (I'm not using them in any of my code).  Do you
think I should just make UTF8 functions and attach them?  If this isn't
fixed, no one will be able to read in UTF8 files for this version, since
these methods that are NULL when you read in a UTF8 file, and are called
internally.

 

Thanks

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Jim Idle
Sent: Thursday, March 18, 2010 4:55 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime

 

I have not supplied string methods for those encodings I am afraid, I
did not have time. But the string stuff is just a convenience method -
for performance you should just use the pointers in the tokens.

 

Jim

 

 

 

From: Goins, John C (IS) [mailto:[email protected]] 
Sent: Thursday, March 18, 2010 1:45 PM
To: Jim Idle; [email protected]
Subject: RE: [antlr-dev] UNICODE file input for C Runtime

 

Jim - 

 

Thanks, I've integrated this release and used it successfully with UTF
16 and ASCII (8 bit) files so far in limited testing.  However, I'm
having problems with UTF8.  I tracked the problem down to the function
antlr3StringFactoryNew() inside antlr2string.c.  The case statement only
sets the API for UTF16 and 8BIT.  I can make some more API functions for
the rest, if that's all that's missing.  I suspect you may have already
done so, though.  I believe the case statement will need to be filled
for all the various types before releasing this version, unless I am
missing something.  An error occurs in antlr3filestream.c line 81 when
loading UTF8 files because the newStr8 function is null for the input
stream.

 

John  

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Jim Idle
Sent: Monday, February 22, 2010 4:18 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime

 

I think you mean 'standard Unicode encodings' rather than Unicode ;-)

 

This is built in to the next release, I just have not had time to get to
doing the actual release. You can get the new sources from 

 

http://fisheye2.atlassian.com/browse/antlr

 

though they are not well tested as of yet. You can also get a perforce
login from Terence, or use the git mirror at: http://github.com/antlr
You will need to read through the new source to use it as I have not had
time to update the docs yet either.

 

 

Jim

 

From: Goins, John C (IS) [mailto:[email protected]] 
Sent: Monday, February 22, 2010 12:40 PM
To: Jim Idle; [email protected]
Subject: RE: [antlr-dev] UNICODE file input for C Runtime

 

I was wondering if there were source code or a C-Runtime update
available yet that handled loading UNICODE files in the C-Runtime?  If,
so, where can I grab them from. Is the next release a 3.x version or
will it be 4.x?  TIA

 

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Jim Idle
Sent: Wednesday, January 06, 2010 7:43 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime

 

You should find sample C code by searching antlr.markmail.org

 

However if you can wait a few weeks then the next release will support a
universal input stream that processes BOM and supports UTF8, UTF16,
UTF32, ASCII/8bit and EBCDIC.

 

Jim

 

From: [email protected] [mailto:[email protected]]
On Behalf Of Goins, John C (IS)
Sent: Wednesday, January 06, 2010 3:38 PM
To: [email protected]
Subject: [antlr-dev] UNICODE file input for C Runtime

 

I've found ANTLR very useful as a language parser for my application,
but I now have a requirement to use UNICODE files as input.  I'm using
the C runtime since my application is written in C. I hope someone can
help me with a couple of questions.

There are two bytes at the beginning of a UNICODE file. My application
will be run on multiple platforms (Java wasn't an option) and I will
need to interpret the UNICODE BOM (byte order mark) since I don't think
ANTLR uses this, is that correct?  I can write a function to always set
the order to one particular way (the input files could come from
different architecture machines) by reading the BOM myself. I think that
is a correct approach, unless there is something in the ANTLR C Runtime
that can help.

I've read about how I need to convert a UNICODE file to UTF-32 and use
the UCS2 input functions, but I've had little to no success in doing so.
I get lots of errors or things just don't parse. Does anyone have sample
C code that accomplishes this? Or even the functions that I should use
and order in which to call them?

TIA

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] UNICODE file input for C Runtime

Reply via email to