Julien ---
Deleting the method outright is cleaner, but the method is
public and doing so could break API compatibility. Making
it no-op is safer.
As for a test case, creating one is very simple:
1. Load any webpage with 2-byte characters encoded
as UTF-8, such as the euro character
2. WebTester.assertTextPresent(theCharacters)
fails when it shouldn't when executed on any machine
with ISO 8859-1 as the system character set. I'm
not sure how to set the character set on a given machine,
but you can get Java to tell you yours with this command:
java.nio.charset.Charset.defaultCharset()
If you'd like me to draft this up as a test case, let me
know and I'll do it.
Cheers,
Jesse
On 6/8/06, Julien HENRY <[EMAIL PROTECTED]> wrote:
>
>
> Hi Jesse,
>
> I don't know exactly why this method was introduced, but perhaps the elders
> can explain the reasons. In case of this method is useless, it should be
> deleted. I disagree with having a no-op method.
> Perhaps some test cases should highlight the problem...
>
> ++
> Julien
>
> ----- Message d'origine ----
> De : Jesse Wilson <[EMAIL PROTECTED]>
> À : [email protected]
> Envoyé le : Jeudi, 8 Juin 2006, 3h10mn 20s
> Objet : [Jwebunit-users] Context.toEncodedString() doesn't make sense
>
>
> Hi JWebUnit team!
>
> I'm a new user of JWebUnit and I'm having problems using
> it with multi-byte characters, the Euro character in particular.
>
> The toEncodedString() method supposedly converts a String from
> one encoding to another:
> public String toEncodedString(String text) {
> try {
> return new String(text.getBytes(), encodingScheme);
> } catch (UnsupportedEncodingException e) {
> e.printStackTrace();
> return text;
> }
> }
>
> Unfortunately, this doesn't make sense. Internally, all Strings
> in Java are UTF-16, regardless of what encoding they were
> in when you created them from bytes. The String constructors
> automatically convert bytes from their specified encoding to UTF-16.
> There's no reason to worry about encoding of Java Strings until
> you're reading from or writing to bytes, since all Strings are
> the same encoding internally.
>
> Since the toEncodedString() does not convert Strings from
> one encoding to another, what does it do?
> 1. text.getBytes() converts the UTF-16 String into a byte[] array
> using the platform's default character encoding. On my Linux
> box, the default character encoding scheme is ISO-8859-1, a
> poor choice because it only supports ~225 distinct characters.
> This means whenever a multibyte character is encoded, information
> will be lost.
> 2. new String( platformEncodedString, encodingScheme) takes an
> array of bytes encoded in my platform's default scheme, and uses
> a potentially different encoding scheme to convert it to a proper
> Java UTF-16 String. Since the encoding charset and the decoding
> charset differ, we can get data corruption on this step.
>
> For example, on my Red hat box (ISO-8859-1), the euro character is
> converted to a "?"
> String euro = "\u20AC";
> byte[] euroAsBytes = euro.getBytes("ISO-8859-1");
> String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
> equals "?
> On my coworker's Ubuntu box (UTF-8) the euro character is converted to
> String euro = "\u20AC";
> byte[] euroAsBytes = euro.getBytes("UTF-8"); // array length is 3
> String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
> 3 garbage chars
>
> Since the encoding is unnecessary, I strongly recommend changing
> the method's implementation with a no-op:
> public String toEncodedString(String text) {
> return text;
> }
>
> Thanks in advance,
> Jesse Wilson
>
>
> _______________________________________________
> Jwebunit-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>
>
>
> _______________________________________________
> Jwebunit-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>
>
>
_______________________________________________
Jwebunit-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jwebunit-users