On Sep 5, 2:10 am, Folke <[EMAIL PROTECTED]> wrote:
> On Sep 4, 1:29 pm, Rohit <[EMAIL PROTECTED]> wrote:
>
>
>
> > On Sep 4, 11:49 am, Folke <[EMAIL PROTECTED]> wrote:
>
> > > UTF-8 is already a multibyte representation of Unicode characters.
> > > JavaScript operates on UTF-16 characters (wide characters) but for
> > > HTTP requests the data is usually encoded with UTF-8.
>
> > > BTW, pure US-ASCII data is not changed when encoded with UTF-8, only
> > > characters with an ordinal value greater than 127 is converted to two,
> > > three or four bytes.
>
> > I know this. But there are two layers here and one does not understand
> > anything other than US-ASCII and one understands UTF-8 and US-ASCII.
> > What i am asking is if there is any way i can put some sort of
> > translators between these two layers, make them both handle UTF-8 data
> > correctly. Even the layer which supports only US-ASCII will be able to
> > handle UTF-8 data as it will receive output of wctomb or some such
> > function (which i am assuming will not have '\0'),CMIIAW. So UTF-8
> > data gets converted to some multi byte sequence without '\0' and then
> > i can do reverse translation while reading it back. And if this can be
> > done which java function from string class is appropriate for this
> > task ?
>
> I think you need to get your terminology straight, first. US-ASCII is
> a character set while UTF-8 is a multibyte character set _encoding_
> for Unicode. You cannot convert Unicode to ASCII without losing
> information.
>
Yes i agree i am quite new to this subject and might be using
misleading terms. I would try to be as correct as possible though.

> If your server only allows US-ASCII (7-bit) then you need to convert
> all characters >127 to a sequence of ASCII characters. In the Java-
> world we use Unicode sequences for this. You need a function that
> output \uXXXX for each UTF-16 character > 0x7F. This can also take
> care of your zero-termination problem and other control characters. An
> alternative is to use URL-encoding of UTF-8 data. GWT's URL class has
> encode and decode methods.

Thanks, I will take a look at URL class.

>
> But if by US-ASCII you mean the full 8-bit (which is not US-ASCII)
> then you already have your safe 8-bit multibyte sequence with UTF-8.
> Zero-termination is still a problem, though. Use a simple regular
> expression to escape and unescape 0-bytes and your escape character
> (e.g. \).

hmm.. I am getting confused here. Let me try asking the question in
other words. I have two layers one is gwt based(java) front end and
back end with uclibc without wchar support. What i am trying to
achieve is keep uclibc as it is (without wchar support). Use case is
front end should be able to accept US-ASCII data + UTF-8 data as
input , send it to back end which stores it in a file and returns back
to front end which should be able to render it correctly. Functions
present in back end  depends heavily on null termination property of
input data and as long as possible i would like to avoid it.If my gwt
based layer sends all strings in such way that it will not have '\0'
in between (strings like this is possible as front end supports UTF-8
input)but only at the end, then its enough for me.

I was hoping wctomb would covert a unicode char which in normal case
if treated as single byte char sequence might have '\0' in it, to a
multi byte sequence(might be more than 2 bytes if required) which will
not have '\0' but correct me if i am wrong.

>
> If this still not answers your question I need a more detailed
> description of your architecture and the different layers.

Hope this makes my question clear. If not , let me know what more info
is required.

-Rohit
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to