[ 
http://issues.apache.org/jira/browse/AXISCPP-964?page=comments#action_12376127 
] 

Henrik Nordberg commented on AXISCPP-964:
-----------------------------------------

Nadir K. Amra wrote:

> Henrik, I am not sure how you would do that on windows or linux, but I 
> assume it can be done.  On the OS/400, the XML parser will honor whatever 
> the encoding is in the XML data that is returned. 

This is parser dependent and platform-independent. And, of course, the parser 
must honor the encoding specified. It is part of the XML standard.

> I think on the other 
> platforms it is assumed to be in UTF-8 

UTF-8 is the default encoding. But we specify it explicitly.

> so I guess that answer for 
> non-OS/400 platforms is yes, the XML data must be in UTF-8 (or I think 
> consistent with UTF-8 such as ISO-8859-1, etc.)

Here is the real problem. ISO-8859-1 is NOT consistent with UTF-8. Only ASCII 
is. And then only for characters up to 127.

I think this truly is a bug, except on OS/400, which is the only platform that 
has special code in an #ifdef.
As it is now, users on other platforms must provide their own UTF-8 conversion. 
This, I think, is too much to ask of them, and most don't know that they need 
to do so.

Can we simply use UTF-8 on all platforms, not just on OS/400?

 - Henrik

> Server response not UTF-8 encoded (but claims to be)
> ----------------------------------------------------
>
>          Key: AXISCPP-964
>          URL: http://issues.apache.org/jira/browse/AXISCPP-964
>      Project: Axis-C++
>         Type: Bug

>   Components: SOAP
>     Versions: current (nightly)
>  Environment: All platforms, except OS/400
>     Reporter: Henrik Nordberg

>
> (See the end of this description for a one-liner that works around this 
> problem for most cases.)
> SoapSerializer.cpp, line 379 says
> serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL);
> that is that the SOAP response is UTF-8 encoded. But this is only true for 
> OS/400 as can be seen in HTTPTransport.cpp, lines 311-
> #ifndef __OS400__
>         *m_pActiveChannel << this->getHTTPHeaders ();
>         *m_pActiveChannel << this->m_strBytesToSend.c_str ();
> #else
>         // Ebcdic (OS/400) systems need to convert the data to UTF-8. Note 
> that free() is 
>         // correctly used and should not be changed to delete().              
>         const char *buf = this->getHTTPHeaders ();
>         utf8Buf = toUTF8((char *)buf, strlen(buf)+1);
>         *m_pActiveChannel << utf8Buf;
>         free(utf8Buf);
>         utf8Buf = NULL;
>         utf8Buf = toUTF8((char *)this->m_strBytesToSend.c_str(), 
> this->m_strBytesToSend.length()+1);
>         *m_pActiveChannel << utf8Buf;
>         free(utf8Buf);
>         utf8Buf = NULL;
> #endif
> This leads to clients trying to decode the response as UTF-8, and will have 
> errors whenever the response contains non-ASCII characters (i.e., > 127).
> Axis Java, for example, will prduce this error upon decoding: 
> "java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence."
> A simple workaround is to change SoapSerializer.cpp, line 379:
> from
> serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL);
> to
> serialize( "<?xml version='1.0' encoding='ISO-8859-1' ?>", NULL);
> The real fix, however, is to encode the response with UTF-8 for all platforms 
> (not just OS/400).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to