Re: wctomb equivalent in gwt framework ?

Folke Thu, 04 Sep 2008 14:10:15 -0700

On Sep 4, 1:29 pm, Rohit <[EMAIL PROTECTED]> wrote:
> On Sep 4, 11:49 am, Folke <[EMAIL PROTECTED]> wrote:
>
> > UTF-8 is already a multibyte representation of Unicode characters.
> > JavaScript operates on UTF-16 characters (wide characters) but for
> > HTTP requests the data is usually encoded with UTF-8.
>
> > BTW, pure US-ASCII data is not changed when encoded with UTF-8, only
> > characters with an ordinal value greater than 127 is converted to two,
> > three or four bytes.
>
> I know this. But there are two layers here and one does not understand
> anything other than US-ASCII and one understands UTF-8 and US-ASCII.
> What i am asking is if there is any way i can put some sort of
> translators between these two layers, make them both handle UTF-8 data
> correctly. Even the layer which supports only US-ASCII will be able to
> handle UTF-8 data as it will receive output of wctomb or some such
> function (which i am assuming will not have '\0'),CMIIAW. So UTF-8
> data gets converted to some multi byte sequence without '\0' and then
> i can do reverse translation while reading it back. And if this can be
> done which java function from string class is appropriate for this
> task ?


I think you need to get your terminology straight, first. US-ASCII is
a character set while UTF-8 is a multibyte character set _encoding_
for Unicode. You cannot convert Unicode to ASCII without losing
information.

If your server only allows US-ASCII (7-bit) then you need to convert
all characters >127 to a sequence of ASCII characters. In the Java-
world we use Unicode sequences for this. You need a function that
output \uXXXX for each UTF-16 character > 0x7F. This can also take
care of your zero-termination problem and other control characters. An
alternative is to use URL-encoding of UTF-8 data. GWT's URL class has
encode and decode methods.

But if by US-ASCII you mean the full 8-bit (which is not US-ASCII)
then you already have your safe 8-bit multibyte sequence with UTF-8.
Zero-termination is still a problem, though. Use a simple regular
expression to escape and unescape 0-bytes and your escape character
(e.g. \).

If this still not answers your question I need a more detailed
description of your architecture and the different layers.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: wctomb equivalent in gwt framework ?

Reply via email to