Re: [antlr-dev] UNICODE file input for C Runtime

Jim Idle Thu, 18 Mar 2010 13:55:09 -0700

I have not supplied string methods for those encodings I am afraid, I did not 
have time. But the string stuff is just a convenience method - for performance 
you should just use the pointers in the tokens.




Jim







From: Goins, John C (IS) [mailto:[email protected]]
Sent: Thursday, March 18, 2010 1:45 PM
To: Jim Idle; [email protected]
Subject: RE: [antlr-dev] UNICODE file input for C Runtime



Jim -



Thanks, I've integrated this release and used it successfully with UTF 16 and 
ASCII (8 bit) files so far in limited testing.  However, I'm having problems 
with UTF8.  I tracked the problem down to the function antlr3StringFactoryNew() 
inside antlr2string.c.  The case statement only sets the API for UTF16 and 
8BIT.  I can make some more API functions for the rest, if that's all that's 
missing.  I suspect you may have already done so, though.  I believe the case 
statement will need to be filled for all the various types before releasing 
this version, unless I am missing something.  An error occurs in 
antlr3filestream.c line 81 when loading UTF8 files because the newStr8 function 
is null for the input stream.



John



From: [email protected] [mailto:[email protected]] On 
Behalf Of Jim Idle
Sent: Monday, February 22, 2010 4:18 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime



I think you mean 'standard Unicode encodings' rather than Unicode ;-)



This is built in to the next release, I just have not had time to get to doing 
the actual release. You can get the new sources from



http://fisheye2.atlassian.com/browse/antlr



though they are not well tested as of yet. You can also get a perforce login 
from Terence, or use the git mirror at: http://github.com/antlr You will need 
to read through the new source to use it as I have not had time to update the 
docs yet either.





Jim



From: Goins, John C (IS) [mailto:[email protected]]
Sent: Monday, February 22, 2010 12:40 PM
To: Jim Idle; [email protected]
Subject: RE: [antlr-dev] UNICODE file input for C Runtime



I was wondering if there were source code or a C-Runtime update available yet 
that handled loading UNICODE files in the C-Runtime?  If, so, where can I grab 
them from. Is the next release a 3.x version or will it be 4.x?  TIA





From: [email protected] [mailto:[email protected]] On 
Behalf Of Jim Idle
Sent: Wednesday, January 06, 2010 7:43 PM
To: [email protected]
Subject: Re: [antlr-dev] UNICODE file input for C Runtime



You should find sample C code by searching antlr.markmail.org



However if you can wait a few weeks then the next release will support a 
universal input stream that processes BOM and supports UTF8, UTF16, UTF32, 
ASCII/8bit and EBCDIC.



Jim



From: [email protected] [mailto:[email protected]] On 
Behalf Of Goins, John C (IS)
Sent: Wednesday, January 06, 2010 3:38 PM
To: [email protected]
Subject: [antlr-dev] UNICODE file input for C Runtime



I've found ANTLR very useful as a language parser for my application, but I now 
have a requirement to use UNICODE files as input.  I'm using the C runtime 
since my application is written in C. I hope someone can help me with a couple 
of questions.

There are two bytes at the beginning of a UNICODE file. My application will be 
run on multiple platforms (Java wasn't an option) and I will need to interpret 
the UNICODE BOM (byte order mark) since I don't think ANTLR uses this, is that 
correct?  I can write a function to always set the order to one particular way 
(the input files could come from different architecture machines) by reading 
the BOM myself. I think that is a correct approach, unless there is something 
in the ANTLR C Runtime that can help.

I've read about how I need to convert a UNICODE file to UTF-32 and use the UCS2 
input functions, but I've had little to no success in doing so.  I get lots of 
errors or things just don't parse. Does anyone have sample C code that 
accomplishes this? Or even the functions that I should use and order in which 
to call them?

TIA

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] UNICODE file input for C Runtime

Reply via email to