Re: [Jwebunit-users] Re : Context.toEncodedString() doesn't make sense

Jesse Wilson Thu, 08 Jun 2006 01:10:13 -0700

Julien ---

Deleting the method outright is cleaner, but the method is
public and doing so could break API compatibility. Making
it no-op is safer.


As for a test case, creating one is very simple:
 1. Load any webpage with 2-byte characters encoded
     as UTF-8, such as the euro character
 2. WebTester.assertTextPresent(theCharacters)
    fails when it shouldn't when executed on any machine
    with ISO 8859-1 as the system character set. I'm
    not sure how to set the character set on a given machine,
    but you can get Java to tell you yours with this command:
      java.nio.charset.Charset.defaultCharset()

If you'd like me to draft this up as a test case, let me
know and I'll do it.

Cheers,
Jesse




On 6/8/06, Julien HENRY <[EMAIL PROTECTED]> wrote:
>
>
> Hi Jesse,
>
> I don't know exactly why this method was introduced, but perhaps the elders
> can explain the reasons. In case of this method is useless, it should be
> deleted. I disagree with having a no-op method.
> Perhaps some test cases should highlight the problem...
>
> ++
> Julien
>
> ----- Message d'origine ----
> De : Jesse Wilson <[EMAIL PROTECTED]>
> À : [email protected]
> Envoyé le : Jeudi, 8 Juin 2006, 3h10mn 20s
> Objet : [Jwebunit-users] Context.toEncodedString() doesn't make sense
>
>
> Hi JWebUnit team!
>
> I'm a new user of JWebUnit and I'm having problems using
> it with multi-byte characters, the Euro character in particular.
>
> The toEncodedString() method supposedly converts a String from
> one encoding to another:
>   public String toEncodedString(String text) {
>     try {
>       return new String(text.getBytes(), encodingScheme);
>     } catch (UnsupportedEncodingException e) {
>       e.printStackTrace();
>       return text;
>     }
>   }
>
> Unfortunately, this doesn't make sense. Internally, all Strings
> in Java are UTF-16, regardless of what encoding they were
> in when you created them from bytes. The String constructors
> automatically convert bytes from their specified encoding to UTF-16.
> There's no reason to worry about encoding of Java Strings until
> you're reading from or writing to bytes, since all Strings are
> the same encoding internally.
>
> Since the toEncodedString() does not convert Strings from
> one encoding to another, what does it do?
> 1. text.getBytes() converts the UTF-16 String into a byte[] array
>    using the platform's default character encoding. On my Linux
>    box, the default character encoding scheme is ISO-8859-1, a
>    poor choice because it only supports ~225 distinct characters.
>    This means whenever a multibyte character is encoded, information
>    will be lost.
> 2. new String( platformEncodedString, encodingScheme) takes an
>    array of bytes encoded in my platform's default scheme, and uses
>    a potentially different encoding scheme to convert it to a proper
>    Java UTF-16 String. Since the encoding charset and the decoding
>    charset differ, we can get data corruption on this step.
>
> For example, on my Red hat box (ISO-8859-1), the euro character is
> converted to a "?"
>         String euro = "\u20AC";
>         byte[] euroAsBytes = euro.getBytes("ISO-8859-1");
>         String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
> equals "?
> On my coworker's Ubuntu box (UTF-8) the euro character is converted to
>         String euro = "\u20AC";
>         byte[] euroAsBytes = euro.getBytes("UTF-8"); // array length is 3
>         String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
> 3 garbage chars
>
> Since the encoding is unnecessary, I strongly recommend changing
> the method's implementation with a no-op:
>   public String toEncodedString(String text) {
>     return text;
>   }
>
> Thanks in advance,
> Jesse Wilson
>
>
> _______________________________________________
> Jwebunit-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>
>
>
> _______________________________________________
> Jwebunit-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>
>
>


_______________________________________________
Jwebunit-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jwebunit-users

Re: [Jwebunit-users] Re : Context.toEncodedString() doesn't make sense

Reply via email to