Re: [Jwebunit-users] Re : Context.toEncodedString() doesn't makesense

Julien Henry Thu, 08 Jun 2006 06:54:09 -0700

Jesse,

Please, could you provide a TestCase that uses special characters. This 
way, I can see if it works before/after the modification.


Thanks

Julien

Jesse Wilson a écrit :
> Julien ---
>
> Deleting the method outright is cleaner, but the method is
> public and doing so could break API compatibility. Making
> it no-op is safer.
>
> As for a test case, creating one is very simple:
>  1. Load any webpage with 2-byte characters encoded
>      as UTF-8, such as the euro character
>  2. WebTester.assertTextPresent(theCharacters)
>     fails when it shouldn't when executed on any machine
>     with ISO 8859-1 as the system character set. I'm
>     not sure how to set the character set on a given machine,
>     but you can get Java to tell you yours with this command:
>       java.nio.charset.Charset.defaultCharset()
>
> If you'd like me to draft this up as a test case, let me
> know and I'll do it.
>
> Cheers,
> Jesse
>
>
>
>
> On 6/8/06, Julien HENRY <[EMAIL PROTECTED]> wrote:
>   
>> Hi Jesse,
>>
>> I don't know exactly why this method was introduced, but perhaps the elders
>> can explain the reasons. In case of this method is useless, it should be
>> deleted. I disagree with having a no-op method.
>> Perhaps some test cases should highlight the problem...
>>
>> ++
>> Julien
>>
>> ----- Message d'origine ----
>> De : Jesse Wilson <[EMAIL PROTECTED]>
>> À : [email protected]
>> Envoyé le : Jeudi, 8 Juin 2006, 3h10mn 20s
>> Objet : [Jwebunit-users] Context.toEncodedString() doesn't make sense
>>
>>
>> Hi JWebUnit team!
>>
>> I'm a new user of JWebUnit and I'm having problems using
>> it with multi-byte characters, the Euro character in particular.
>>
>> The toEncodedString() method supposedly converts a String from
>> one encoding to another:
>>   public String toEncodedString(String text) {
>>     try {
>>       return new String(text.getBytes(), encodingScheme);
>>     } catch (UnsupportedEncodingException e) {
>>       e.printStackTrace();
>>       return text;
>>     }
>>   }
>>
>> Unfortunately, this doesn't make sense. Internally, all Strings
>> in Java are UTF-16, regardless of what encoding they were
>> in when you created them from bytes. The String constructors
>> automatically convert bytes from their specified encoding to UTF-16.
>> There's no reason to worry about encoding of Java Strings until
>> you're reading from or writing to bytes, since all Strings are
>> the same encoding internally.
>>
>> Since the toEncodedString() does not convert Strings from
>> one encoding to another, what does it do?
>> 1. text.getBytes() converts the UTF-16 String into a byte[] array
>>    using the platform's default character encoding. On my Linux
>>    box, the default character encoding scheme is ISO-8859-1, a
>>    poor choice because it only supports ~225 distinct characters.
>>    This means whenever a multibyte character is encoded, information
>>    will be lost.
>> 2. new String( platformEncodedString, encodingScheme) takes an
>>    array of bytes encoded in my platform's default scheme, and uses
>>    a potentially different encoding scheme to convert it to a proper
>>    Java UTF-16 String. Since the encoding charset and the decoding
>>    charset differ, we can get data corruption on this step.
>>
>> For example, on my Red hat box (ISO-8859-1), the euro character is
>> converted to a "?"
>>         String euro = "\u20AC";
>>         byte[] euroAsBytes = euro.getBytes("ISO-8859-1");
>>         String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
>> equals "?
>> On my coworker's Ubuntu box (UTF-8) the euro character is converted to
>>         String euro = "\u20AC";
>>         byte[] euroAsBytes = euro.getBytes("UTF-8"); // array length is 3
>>         String euroEncoded = new String(euroAsBytes, "ISO-8859-1"); //
>> 3 garbage chars
>>
>> Since the encoding is unnecessary, I strongly recommend changing
>> the method's implementation with a no-op:
>>   public String toEncodedString(String text) {
>>     return text;
>>   }
>>
>> Thanks in advance,
>> Jesse Wilson
>>
>>
>> _______________________________________________
>> Jwebunit-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>>
>>
>>
>> _______________________________________________
>> Jwebunit-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>>
>>
>>
>>     
>
>
> _______________________________________________
> Jwebunit-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/jwebunit-users
>
>   

This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient,  you are not authorized 
to read, print, retain, copy, disseminate,  distribute, or use this message or 
any part thereof. If you receive this  message in error, please notify the 
sender immediately and delete all  copies of this message.



_______________________________________________
Jwebunit-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jwebunit-users

Re: [Jwebunit-users] Re : Context.toEncodedString() doesn't makesense

Reply via email to