Hi Jim, On 6/24/2011 12:42 PM, Jim Idle wrote: > Please note that the documentation for the C runtime in 3.4 is yet to be > updated. In the meantime, if you wish to try it, then there is one change > that you need to be aware of: > > > > 1) The distinction between ASCII and UCS2 input streams is now removed > and there is a single function: antlr3FileStreamNew() to replace the file > related input streams and a function” antlr3StringStreamNew to replace the > memory related input streams. Prototypes and usage: > > > > > > antlr3FileStreamNew(pANTLR3_UINT8 fileName, ANTLR3_UINT32 encoding) > > antlr3StringStreamNew(pANTLR3_UINT8 data, ANTLR3_UINT32 encoding, > ANTLR3_UINT32 size, pANTLR3_UINT8 name) > > > > fileName – path to input file in 8 bit characters. Used to call fopen() > > data – pointer to input data in any encoded form (note that I will change > this to void * in the next beta/release) > > size – the size of the input data (always bytesm regardless of encoding) > > name – The name to use for the string stream (passed to error handlers for > instance) may be NULL It looks like the name argument cannot be NULL. I tried this and it promptly crashed. It looks like the access violation occurs in the strlen() function within the newStr8() function. If I pass in any old string, it works of course. I have no use for this name, so I'd like to pass NULL. Is this a bug, or should I just be passing an empty string instead? I'm using ANTLR3_ENC_8BIT, if that matters.
Thanks, - Justin > > > Then the encoding values are: > > > > ANTLR3_ENC_8BIT – 8 bit encoding (ASCII/latin1/etc) (replaces the > existing ASCII stream) > > ANTLR3_ENC_UTF8 – UTF8 encoding (eats any BOM that may be present) > > ANTLR3_ENC_UTF16 – UTF16 encoding (also handles UCS2) – determine byte > order from BOM or machine natural without BOM > > ANTLR3_ENC_UTF16BE – UTF16 encoding (also handles UCS2), big endian but no > BOM > > ANTLR3_ENC_UTF16LE – UTF16 encoding (also handles UCS2), little endian but > no BOM > > ANTLR3_ENC_UTF32 - UTF32 encoding – determine byte order from BOM or > machine natural without BOM > > ANTLR3_ENC_UTF32BE - UTF32 encoding – big endian but no BOM > > ANTLR3_ENC_UTF32LE - UTF32 encoding – little endian but no BOM > > ANTLR3_ENC_EBCDIC - EBCDIC encoding (8 bit). > > > > Note that EBCDIC encoding means that the input is in EBCDIC and it is not > changed. The LA() method for EBCDIC encoding converts a character to ASCII > before matching. Therefore the pointers to the first character of the token > in the input stream remain pointing at EBCDIC and you are responsible for > any conversion of the token strings if you need to convert them. > > > > Encoding is as per the Unicode standards and supports the full Unicode > character range and all surrogate pairs are decoded in UTF16. Note however > that for performance reasons, errors in the encoding are usually ignored > (for instance a valid hi surrogate that does not have a lo surrogate), but > that invalid sequences that may not be ignored, may screw up your input. You > can of course override any of the LA methods and report such things as > errors, should you need to do so. The purpose of LA() is to return the 32 > bit integer Unicode code point for the specified character – how it does > that is irrelevant to the lexer, which is just matching 32 but numbers. This > means you should not code your lexer to match surrogates, just the code > points. > > > > Jim > > List: http://www.antlr.org/mailman/listinfo/antlr-interest > Unsubscribe: > http://www.antlr.org/mailman/options/antlr-interest/your-email-address List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
