[ http://issues.apache.org/jira/browse/AXISCPP-964?page=comments#action_12376127 ]
Henrik Nordberg commented on AXISCPP-964: ----------------------------------------- Nadir K. Amra wrote: > Henrik, I am not sure how you would do that on windows or linux, but I > assume it can be done. On the OS/400, the XML parser will honor whatever > the encoding is in the XML data that is returned. This is parser dependent and platform-independent. And, of course, the parser must honor the encoding specified. It is part of the XML standard. > I think on the other > platforms it is assumed to be in UTF-8 UTF-8 is the default encoding. But we specify it explicitly. > so I guess that answer for > non-OS/400 platforms is yes, the XML data must be in UTF-8 (or I think > consistent with UTF-8 such as ISO-8859-1, etc.) Here is the real problem. ISO-8859-1 is NOT consistent with UTF-8. Only ASCII is. And then only for characters up to 127. I think this truly is a bug, except on OS/400, which is the only platform that has special code in an #ifdef. As it is now, users on other platforms must provide their own UTF-8 conversion. This, I think, is too much to ask of them, and most don't know that they need to do so. Can we simply use UTF-8 on all platforms, not just on OS/400? - Henrik > Server response not UTF-8 encoded (but claims to be) > ---------------------------------------------------- > > Key: AXISCPP-964 > URL: http://issues.apache.org/jira/browse/AXISCPP-964 > Project: Axis-C++ > Type: Bug > Components: SOAP > Versions: current (nightly) > Environment: All platforms, except OS/400 > Reporter: Henrik Nordberg > > (See the end of this description for a one-liner that works around this > problem for most cases.) > SoapSerializer.cpp, line 379 says > serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL); > that is that the SOAP response is UTF-8 encoded. But this is only true for > OS/400 as can be seen in HTTPTransport.cpp, lines 311- > #ifndef __OS400__ > *m_pActiveChannel << this->getHTTPHeaders (); > *m_pActiveChannel << this->m_strBytesToSend.c_str (); > #else > // Ebcdic (OS/400) systems need to convert the data to UTF-8. Note > that free() is > // correctly used and should not be changed to delete(). > const char *buf = this->getHTTPHeaders (); > utf8Buf = toUTF8((char *)buf, strlen(buf)+1); > *m_pActiveChannel << utf8Buf; > free(utf8Buf); > utf8Buf = NULL; > utf8Buf = toUTF8((char *)this->m_strBytesToSend.c_str(), > this->m_strBytesToSend.length()+1); > *m_pActiveChannel << utf8Buf; > free(utf8Buf); > utf8Buf = NULL; > #endif > This leads to clients trying to decode the response as UTF-8, and will have > errors whenever the response contains non-ASCII characters (i.e., > 127). > Axis Java, for example, will prduce this error upon decoding: > "java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8 sequence." > A simple workaround is to change SoapSerializer.cpp, line 379: > from > serialize( "<?xml version='1.0' encoding='utf-8' ?>", NULL); > to > serialize( "<?xml version='1.0' encoding='ISO-8859-1' ?>", NULL); > The real fix, however, is to encode the response with UTF-8 for all platforms > (not just OS/400). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
