On Sep 4, 1:29 pm, Rohit <[EMAIL PROTECTED]> wrote: > On Sep 4, 11:49 am, Folke <[EMAIL PROTECTED]> wrote: > > > UTF-8 is already a multibyte representation of Unicode characters. > > JavaScript operates on UTF-16 characters (wide characters) but for > > HTTP requests the data is usually encoded with UTF-8. > > > BTW, pure US-ASCII data is not changed when encoded with UTF-8, only > > characters with an ordinal value greater than 127 is converted to two, > > three or four bytes. > > I know this. But there are two layers here and one does not understand > anything other than US-ASCII and one understands UTF-8 and US-ASCII. > What i am asking is if there is any way i can put some sort of > translators between these two layers, make them both handle UTF-8 data > correctly. Even the layer which supports only US-ASCII will be able to > handle UTF-8 data as it will receive output of wctomb or some such > function (which i am assuming will not have '\0'),CMIIAW. So UTF-8 > data gets converted to some multi byte sequence without '\0' and then > i can do reverse translation while reading it back. And if this can be > done which java function from string class is appropriate for this > task ?
I think you need to get your terminology straight, first. US-ASCII is a character set while UTF-8 is a multibyte character set _encoding_ for Unicode. You cannot convert Unicode to ASCII without losing information. If your server only allows US-ASCII (7-bit) then you need to convert all characters >127 to a sequence of ASCII characters. In the Java- world we use Unicode sequences for this. You need a function that output \uXXXX for each UTF-16 character > 0x7F. This can also take care of your zero-termination problem and other control characters. An alternative is to use URL-encoding of UTF-8 data. GWT's URL class has encode and decode methods. But if by US-ASCII you mean the full 8-bit (which is not US-ASCII) then you already have your safe 8-bit multibyte sequence with UTF-8. Zero-termination is still a problem, though. Use a simple regular expression to escape and unescape 0-bytes and your escape character (e.g. \). If this still not answers your question I need a more detailed description of your architecture and the different layers. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google Web Toolkit" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/Google-Web-Toolkit?hl=en -~----------~----~----~----~------~----~------~--~---
