Hi Jongjin,
Let me clarify ...
Is the switch for only Admin Service and Client, for app global,
or for per each apps ?
On the i18n point of view, I hope Axis works fine any time with
all of languages by using the default settings.
Thanks,
Toshi <[EMAIL PROTECTED]>
On Wed, 19 Jan 2005, Jongjin Choi wrote:
> Hi, Toshi and all.
>
> I'd like to propose these for backward compatibility:
> - keep the escaping as default
> - make a runtime option (axis property in wsdd) for switching to
> no-escaping.
>
> The current behavior has no problem for an application to handle the
> soap message. I just pointed that the message size can be somewhat
> larger with escaping.
>
> But in this case, the admin client (AdminClient.java) seems to writes
> the content of soap body directly to console. I think the switch can
> be applied to Admin Service and Client.
>
> Any thought?
>
> /Jongjin
>
> ----- Original Message -----
> From: "Toshiyuki Kimura" <[EMAIL PROTECTED]>
> To: <[email protected]>
> Cc: "Changshin Lee" <[EMAIL PROTECTED]>;
> "Jongjin Choi" <[EMAIL PROTECTED]>
> Sent: Wednesday, January 19, 2005 12:41 PM
> Subject: Re: UTF8Encoder question...
>
>
>> Hi Ias, Jongjin and all,
>>
>> Sorry for the cutting in. I'd like to know the conclusion.
>>
>> As you may know, I'm now working for i18n of Axis. Then, the
>> Japanese Axis Community has already made a Japanized resources.
>> On the testing, I faced an encoding problem of UTF-8.
>>
>> With the latest CVS codes, I get a escaping message from the
>> server-side Axis as follows;
>>
>> <Admin>処理を実行しま
>> した/ [en]-(Done processing)</Admin>
>>
>> instead of
>>
>> <Admin>[Japanese Message] / [en]-(Done processing)</Admin>
>>
>> As a side node, I could have valid Japanese characters when I
>> applied a patch of Jongjin to my local 'UTF8Encoder.java'.
>>
>> Any thought?
>>
>> Regards,
>> Toshi <[EMAIL PROTECTED]>
>>
>> On Thu, 30 Dec 2004, Changshin Lee wrote:
>>
>>>> Ias and all,
>>>>
>>>> If you revive the commented and removed code of UTF8Encoder that is :
>>>> /*
>>>> TODO: Try fixing this block instead of code above.
>>>> if (character < 0x80) {
>>>> writer.write(character);
>>>> } else if (character < 0x800) {
>>>> writer.write((0xC0 | character >> 6));
>>>> writer.write((0x80 | character & 0x3F));
>>>> } else if (character < 0x10000) {
>>>> writer.write((0xE0 | character >> 12));
>>>> writer.write((0x80 | character >> 6 & 0x3F));
>>>> writer.write((0x80 | character & 0x3F));
>>>> } else if (character < 0x200000) {
>>>> writer.write((0xF0 | character >> 18));
>>>> writer.write((0x80 | character >> 12 & 0x3F));
>>>> writer.write((0x80 | character >> 6 & 0x3F));
>>>> writer.write((0x80 | character & 0x3F));
>>>> }
>>>> */
>>>> and uncommented current escaping code, the all-tests will fail.
>>>> As I addressed, these code would be necessary for OutputStream not Writer.
>>>> In this case the Writer is used and the code can be simply rewrited (as in UTF16Encoder)
>>>>
>>>> writer.write(character);
>>>>
>>>> I think the all-tests will succeed. (I can verify this now because current CVS all-tests fails.)
>>>>
>>>
>>> Could you run all-tests except those failed chronically (by adding
>>> them to excluded list)? If the result is clean, I'm on the change (and
>>> it's easy to revert as well, so commit it :-).
>>>
>>>> For readability of SOAP message, I think it is not the responsibility of Axis.
>>>
>>> Human readability is one of essenses in XML (and SOAP). Assuming that
>>> a SOAP processor processes a SOAP input message readable to a user,
>>> then the output of the processing as a form of SOAP must be readable
>>> to the user. Therefore when people use Axis as a SOAP processor, they
>>> will blame Axis for a result containing unreadably broken characters
>>> to them. It's not utterly up to Axis, but Axis can cause it, and Axis
>>> should guarantee that there's no distortion in terms of readability
>>> from Alpha to Omega of SOAP processing.
>>>
>>> Ias
>>>
>>>>
>>>> This is the diff:
>>>> cvs diff -u UTF8Encoder.java
>>>> Index: UTF8Encoder.java
>>>> ===================================================================
>>>> RCS file: /home/cvspublic/ws-axis/java/src/org/apache/axis/components/encoding/UTF8Encoder.java,v
>>>> retrieving revision 1.4
>>>> diff -u -r1.4 UTF8Encoder.java
>>>> --- UTF8Encoder.java 4 Nov 2004 18:23:12 -0000 1.4
>>>> +++ UTF8Encoder.java 30 Dec 2004 01:20:03 -0000
>>>> @@ -82,10 +82,6 @@
>>>> "invalidXmlCharacter00",
>>>> Integer.toHexString(character),
>>>> xmlString));
>>>> - } else if (character > 0x7F) {
>>>> - writer.write("&#x");
>>>> - writer.write(Integer.toHexString(character).toUpperCase());
>>>> - writer.write(";");
>>>> } else {
>>>> writer.write(character);
>>>> }
>>>>
>>>>
>>>> /Jongjin
>>>>
>>>> ----- Original Message -----
>>>> From: "Changshin Lee" <[EMAIL PROTECTED]>
>>>> To: <[email protected]>
>>>> Sent: Thursday, December 30, 2004 1:20 AM
>>>> Subject: Re: UTF8Encoder question...
>>>>
>>>>> Ias,
>>>>>
>>>>> Even if we consider the system which can't display the soap message well for its lack of unicode-font,
>>>>> I think the default encoding should be as-it-is not scaping.
>>>>>
>>>>> The soap message is not for display and it is better to generate the more compact soap message from the web services toolkit's point of view.
>>>>>
>>>>
>>>> SOAP messages are not for presentation but should be readable :-)
>>>>
>>>>> For displaying, the application can convert the soap message to appropriate encoding. (as you know, here in korea, we use euc-kr. and also as you know, the conversion can be possible with some line of java code.)
>>>>> Also, as far as I know, Axis used as-it-is way in Axis 1.0 or 1.1.
>>>>>
>>>>
>>>> That's a good point. However, we need to pay attention to those may
>>>> want UTF8Encoder to run conversion like now. If we revert Axis 1.2's
>>>> UTF8Encoder, we should inform users of the regression clearly in order
>>>> not to puzzle them.
>>>>
>>>>> I remember that the reason to use scaping in UTF8Encoder was to handle the french accent or german umlaut a few months ago. This is reflected in test.encoding.TestString test case.
>>>>>
>>>>
>>>> The current mechanism came up in April. At the moment
>>>>
>>>> TODO: Try fixing this block instead of code above.
>>>> if (character < 0x80) {
>>>> writer.write(character);
>>>> } else if (character < 0x800) {
>>>> writer.write((0xC0 | character >> 6));
>>>> writer.write((0x80 | character & 0x3F));
>>>> } else if (character < 0x10000) {
>>>> writer.write((0xE0 | character >> 12));
>>>> writer.write((0x80 | character >> 6 & 0x3F));
>>>> writer.write((0x80 | character & 0x3F));
>>>> } else if (character < 0x200000) {
>>>> writer.write((0xF0 | character >> 18));
>>>> writer.write((0x80 | character >> 12 & 0x3F));
>>>> writer.write((0x80 | character >> 6 & 0x3F));
>>>> writer.write((0x80 | character & 0x3F));
>>>> }
>>>> */
>>>>
>>>> but the commented part was gone in 1_2RC2 tag.
>>>>
>>>>> Any thought?
>>>>>
>>>>
>>>> So, what you're saying is that the current UTF8Encoder's behavior
>>>> comes from the test case. In other words, if you change the encoder to
>>>> output "as-it-is", then the test fails. Could we make them consistent,
>>>> I mean, UTF8Encoder outputs without conversion and at the same time
>>>> the case passes?
>>>>
>>>> Ias
>>>>
>>>> P.S. I'd like to hear opinions on changing UTF8Encoder's default
>>>> behavior (and possibly create another encoder or an option for
>>>> conversion). Once we pass all tests with the changed encoder, it is
>>>> worth adopting the change, I believe.
>>>>
>>>>> /Jongjin
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Ias" <[EMAIL PROTECTED]>
>>>>> To: <[email protected]>
>>>>> Sent: Wednesday, December 29, 2004 1:53 AM
>>>>> Subject: RE: UTF8Encoder question...
>>>>>
>>>>>>
>>>>>> From: Jongjin Choi [mailto:[EMAIL PROTECTED]
>>>>>> Sent: Tuesday, December 28, 2004 11:56 AM
>>>>>> To: [email protected]
>>>>>> Subject: UTF8Encoder question...
>>>>>>
>>>>>>
>>>>>> Dims and all,
>>>>>>
>>>>>> UTF8Encoder writes escaped string when the character is over 0x7F.
>>>>>> The escaping does not seem to be necessary because
>>>>>> the Writer (not OutputStream) is used.
>>>>>>
>>>>>> I think this could be just : (line 86)
>>>>>>
>>>>>> writer.write(character);
>>>>>>
>>>>>> instead of : (line 86 ~ 88)
>>>>>> writer.write("&#x);
>>>>>> writer.write(Integer.toHexString(character).toUpperCase());
>>>>>> writer.write(";");
>>>>>>
>>>>>> The escaping just increases the message size.
>>>>>>
>>>>> ias> Yes, it does. However, I think representing a character of which codepoint
>>>>> ias> is over 0x7F as a form of &#x XML entity is one of the aims of the encoder
>>>>> ias> because some systems can't display that character properly due to no
>>>>> ias> unicode-wide fonts built in there. In case it's 100% certain that every node
>>>>> ias> in a messaging system has no problem with "as-it-is" character
>>>>> ias> representation on a XML instance, it must be much more efficient to use a
>>>>> ias> compact encoder as you pointed out instead of UTF8Encoder. Interestingly,
>>>>> ias> AbstractXMLEncoder (which is not instantiable) works in such a way. In
>>>>> ias> consequence, it would be a good idea to create a new encoder to optimize
>>>>> ias> message size and use it with ease of configurability. (Yes, we can recommend
>>>>> ias> it to users dealing with non-Latin character systems :-)
>>>>>>
>>>>>> Happy new year,
>>>>>>
>>>>>> Ias
>>>>>>
>>>>>> P.S. I'm going to switch [EMAIL PROTECTED] to [EMAIL PROTECTED] (soon,
>>>>>> very soon).
>>>>>>
>>>>>>
>>>>>> If the OutputStream is used, the escaping or UTF-8 conversion (which
>>>>>> existed in old UTF8Encoder.java) will be needed.
>>>>>>
>>>>>> Thought?
>>>>>>
>>>>>> /Jongjin
>>>>>>
>>>>>>
>>>>
>>>
>>
- Re: UTF8Encoder question... Jongjin Choi
- Re: UTF8Encoder question... Toshiyuki Kimura
- Re: UTF8Encoder question... Bill Keese
