Sung-Gu
You are right. The examples I presented are meaningless. They are meaningless, because 
URIUtil.toUsingCharset method is meaningless in the very first place. I did my best to 
explain why 

Again, please give me an example (or better a unit test) demonstrating a meaningful 
transformation of one Unicode string into another Unicode string using the method in 
question

Oleg

-----Original Message-----
From: Sung-Gu [mailto:[EMAIL PROTECTED]]
Sent: Montag, 27. Januar 2003 06:01
To: Commons HttpClient Project
Subject: Re: The use of UTIUtil.toUsingCharset?


Hi,

I'm sorry that I wasn't reaching your point...
You're interested in only single-byte encodings with Unicode.
I haven't realized it...

That's why you haven't seen the correct use and display of that method.
I guessed so though. (So, I tried to display byte code values)

And I'd like to comment you that your below examples're not
correct to use...   They're meaning-less...
For display (what you want I guess), you should use code set
or char set supported by your operating system or ISO-8859-1.
For UTF-8 is capable to use only by purposes of transformation
for storage and transmission.
The case you want to use Unicode for display, ISO-10464 is
fully supported and transformation to UTF-8 should be applied
from UCS....

I made it as TODO comment for simple diagram 2 in text file.
 It was not my right previous issue.
(As you know, I'm intersted in double-byte encodings...
 and it would be general way to solve character encoding)
I'll do it sometime later...

Sung-Gu

----- Original Message -----
From: <[EMAIL PROTECTED]>
Subject: Re: The use of UTIUtil.toUsingCharset?


Please take no offense, but URIUtil.toUsingCharset method still does not
make even slightest sense to me. Your example shows how to invoke this
method but does not explain what it is useful for, apart from garbling
unicode strings

Have a look at a simpler example. Here I attempt to (supposedly) convert
"Z�rich" from one encoding into another. However, as you can see
URIUtil.toUsingCharset() always produces garbage

===================================================================
public static void main(String[] args) throws Exception
{
  System.out.println(
    URIUtil.toUsingCharset("Z�rich", "UTF-8", "US-ASCII"));
  System.out.println(
    URIUtil.toUsingCharset("Z�rich", "ASCII", "UTF-8"));
  System.out.println(
    URIUtil.toUsingCharset("Z�rich", "UTF-8", "ISO-8859-1"));
  System.out.println(
    URIUtil.toUsingCharset("Z�rich", "ISO-8859-1", "UTF-8"));
}


Output:

Z��rich
Z?rich
Z�&#131;¼rich
Z�

=================================================================

Java uses 16 bit to represent characters. Therefore the concept of character
encoding is only applicable when working with arrays of bytes, 8 bit units,
that represent a sequence of characters. One indeed needs to take character
encoding into account when converting from byte[] to String or visa versa.
However, converting from Unicode String to an array of bytes to a Unicode
String using different encoding (especially in one method call), in my
opinion, does not produce any sensible results.

If you see things differently, please help me understand what
URIUtil.toUsingCharset() can be USEFUL for

Cheers

Oleg

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: 
<mailto:[EMAIL PROTECTED]>


--
To unsubscribe, e-mail:   
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: 
<mailto:[EMAIL PROTECTED]>

Reply via email to