Andy,

I think your likely issues are:

1) as mentioned earlier the length you are passing in is in bytes and  
the stream needs number of 16 bit chars ;
2) the encoding isn't what you think it is and the 16 bit characters  
are whacked enough to blow your lexer. Make sure you have a catch all  
ANY token listed last in your lexer :

ANY : . { printf("some message"); } ;

3) the memory you are passing is not converted to 16 bit correctly  
using the calls you have here.

Something else.

Sorry I can't get much further but trying to do everything by iPhone  
is a bit tricky.

Jim

On Jun 16, 2009, at 11:18 AM, Andy Grove <[email protected]>  
wrote:
> Jim,
>
> Thanks. I've attempted to use the UCS input stream with this code:
>
> SymbolTable* SQLParser::parse(std::string sql) {
>
>       ....
>
>       std::wstring wsql(sql.begin(), sql.end());
>       const wchar_t *wsqlchars = wsql.c_str();
>       input = antlr3NewUCS2StringInPlaceStream((pANTLR3_UINT16)wsqlchars,  
> wsql.length(), NULL);
>
>       ...
>
> }
>
> Am I even close with this? It compiles OK but now when I run my test  
> the app becomes unresponsive and consumes all the available RAM.
>
> Thanks,
>
> Andy.
>
>
> On Jun 16, 2009, at 9:21 AM, Jim Idle wrote:
>
>> You need the UCS version of the input stream or write a utf32 input  
>> stream and use to pre-supplied UTF8 to UTF32 conversion routine.
>>
>> If you can wait until next reLease I will be supplying these ready  
>> made but they are not difficult to produce, just copy the others.  
>> Internally the euntime uses 32 bit unicode and dies not care how  
>> you provide these.
>>
>> Jim
>>
>> On Jun 16, 2009, at 9:20 AM, Andy Grove  
>> <[email protected]> wrote:
>>
>>> I have a SQL parser that is working fine with standard ASCII  
>>> characters but if I try and insert data containing international  
>>> characters such as:
>>>
>>> "INSERT INTO customer (username, password, title, first_name,  
>>> last_name, addr_line1, addr_line2, addr_city, addr_state,  
>>> country_id) VALUES (''username123', 'password', 'Mr', 'Tåst', 'T 
>>> est', 'Test', 'Test', 'Test', 'TE', 1)"
>>>
>>> I get this error:
>>>
>>> -memory-(1) : lexer error 1 :
>>>     Unexpected character at offset 179, near char(0XC3) :
>>>     åst', 'Test', 'Test
>>>
>>> Here is my setup code:
>>>
>>>     input =  
>>> antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8)stringCopy, l,  
>>> NULL);
>>>     lexer = DbsMySQL_CPPLexerNew(input);
>>>     tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,  
>>> lexer->pLexer->rec->state->tokSource);
>>>     parser = DbsMySQL_CPPParserNew(tstream);
>>>
>>> Do I need to specify the character set somewhere?
>>>
>>> Thanks,
>>>
>>> Andy.
>>>
>>> ---
>>> Andy Grove
>>> Chief Architect
>>> CodeFutures Corporation
>>> "Share Nothing. Shard Everything."
>>>
>>> Cell:    (303) 720-1285
>>> E-Fax:   (303) 395-0426
>>> Web:     http://www.codefutures.com/
>>> Twitter: http://twitter.com/andygrove73
>>>
>>>
>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: 
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> t">http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> /html>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

Reply via email to