This is not a released version - I have not finished that stuff yet. However, if you are not using these things yourself, you should not need to worry about it. There should not be any direct dependence on the STRINGs even if something tries to set one up. I sometimes wish I had never written them to be honest ;-) They only appear if you ask for $T.text.
You can just copy the 8 bit methods for UTF-8 and so on so that things will work. The filestream will have the file name wrong perhaps but that should not really matter. Jim From: Goins, John C (IS) [mailto:[email protected]] Sent: Thursday, March 18, 2010 2:09 PM To: Jim Idle; [email protected] Subject: RE: [antlr-dev] UNICODE file input for C Runtime Jim - I'm not sure how to proceed. Internally the string functions seem to be used in various places (I'm not using them in any of my code). Do you think I should just make UTF8 functions and attach them? If this isn't fixed, no one will be able to read in UTF8 files for this version, since these methods that are NULL when you read in a UTF8 file, and are called internally. Thanks From: [email protected] [mailto:[email protected]] On Behalf Of Jim Idle Sent: Thursday, March 18, 2010 4:55 PM To: [email protected] Subject: Re: [antlr-dev] UNICODE file input for C Runtime I have not supplied string methods for those encodings I am afraid, I did not have time. But the string stuff is just a convenience method - for performance you should just use the pointers in the tokens. Jim From: Goins, John C (IS) [mailto:[email protected]] Sent: Thursday, March 18, 2010 1:45 PM To: Jim Idle; [email protected] Subject: RE: [antlr-dev] UNICODE file input for C Runtime Jim - Thanks, I've integrated this release and used it successfully with UTF 16 and ASCII (8 bit) files so far in limited testing. However, I'm having problems with UTF8. I tracked the problem down to the function antlr3StringFactoryNew() inside antlr2string.c. The case statement only sets the API for UTF16 and 8BIT. I can make some more API functions for the rest, if that's all that's missing. I suspect you may have already done so, though. I believe the case statement will need to be filled for all the various types before releasing this version, unless I am missing something. An error occurs in antlr3filestream.c line 81 when loading UTF8 files because the newStr8 function is null for the input stream. John From: [email protected] [mailto:[email protected]] On Behalf Of Jim Idle Sent: Monday, February 22, 2010 4:18 PM To: [email protected] Subject: Re: [antlr-dev] UNICODE file input for C Runtime I think you mean 'standard Unicode encodings' rather than Unicode ;-) This is built in to the next release, I just have not had time to get to doing the actual release. You can get the new sources from http://fisheye2.atlassian.com/browse/antlr though they are not well tested as of yet. You can also get a perforce login from Terence, or use the git mirror at: http://github.com/antlr You will need to read through the new source to use it as I have not had time to update the docs yet either. Jim From: Goins, John C (IS) [mailto:[email protected]] Sent: Monday, February 22, 2010 12:40 PM To: Jim Idle; [email protected] Subject: RE: [antlr-dev] UNICODE file input for C Runtime I was wondering if there were source code or a C-Runtime update available yet that handled loading UNICODE files in the C-Runtime? If, so, where can I grab them from. Is the next release a 3.x version or will it be 4.x? TIA From: [email protected] [mailto:[email protected]] On Behalf Of Jim Idle Sent: Wednesday, January 06, 2010 7:43 PM To: [email protected] Subject: Re: [antlr-dev] UNICODE file input for C Runtime You should find sample C code by searching antlr.markmail.org However if you can wait a few weeks then the next release will support a universal input stream that processes BOM and supports UTF8, UTF16, UTF32, ASCII/8bit and EBCDIC. Jim From: [email protected] [mailto:[email protected]] On Behalf Of Goins, John C (IS) Sent: Wednesday, January 06, 2010 3:38 PM To: [email protected] Subject: [antlr-dev] UNICODE file input for C Runtime I've found ANTLR very useful as a language parser for my application, but I now have a requirement to use UNICODE files as input. I'm using the C runtime since my application is written in C. I hope someone can help me with a couple of questions. There are two bytes at the beginning of a UNICODE file. My application will be run on multiple platforms (Java wasn't an option) and I will need to interpret the UNICODE BOM (byte order mark) since I don't think ANTLR uses this, is that correct? I can write a function to always set the order to one particular way (the input files could come from different architecture machines) by reading the BOM myself. I think that is a correct approach, unless there is something in the ANTLR C Runtime that can help. I've read about how I need to convert a UNICODE file to UTF-32 and use the UCS2 input functions, but I've had little to no success in doing so. I get lots of errors or things just don't parse. Does anyone have sample C code that accomplishes this? Or even the functions that I should use and order in which to call them? TIA
_______________________________________________ antlr-dev mailing list [email protected] http://www.antlr.org/mailman/listinfo/antlr-dev
