Re: platform support: internationalization and EBCDIC vs ASCII

John Hawkins Tue, 01 Feb 2005 09:50:59 -0800

Hi Nadir,

whatever happened to this? Did we get any conclusions?

John Hawkins

Nadir Amra <[EMAIL PROTECTED]>

22/12/2004 08:37

Please respond to
"Apache AXIS C Developers List"

To	"'Apache AXIS C Developers List'" <[email protected]>
cc
Subject	platform support: internationalization and EBCDIC vs ASCII

Correct me if I am wrong....and sorry for the long note but it is necessary. The AXIS code has a restriction that the locale of the process must be UTF-8 assumes everything is in UTF-8. Thus the code works specifically in processes where the locale is set to UTF-8 or to a single byte ASCII character set such as the Latin-1 locales, since the character set is a subset of UTF-8). For those locales that are not single byte or UTF-8, code does not work so well. Obviously the code does not work on EBCDIC-based systems such as OS/400. I need this restriction removed in version 1.5. To remove the restriction, the code needs to be sensitive to the locale of the process that the client is running in and assume any data received from the client that is to be passed to a web service is in the character set of the locale, and thus needs to be converted to UTF-8. Similarly, any data received from the web service needs to be converted to the character set of the running process, since the various C-runtime string functions are dependent on the locale of the process in order for the functions to work properly. The XML parsers can handle the data coming in from the Web service no matter what the encoding, and there is no problem on that side of things. I am assuming the data obtained by the XML parser is being transcoded to UTF-8. In addition, there are hard-code literal strings that is assumed to be in ASCII. This would also need to be changed. I plan spending a lot of time in the next 4 weeks to get the infrastructure built into the code to allow the code to run on OS/400. Hopefully, the work I put in can easily be extended to other platforms so that if someone wanted to run in a Japanese locale, it would work with minor changes. My thoughts are that a user can indicate whether transcoding should be enabled via a configuration property in the property file. When that happens, the code will create transcoders to convert data from the locale of the process to UTF-8 and from UTF-8 to the locale of the process. I still have to investigate if it is possible to use the XML parser transcoders, or even if that is possible. I am looking for direction from you all to see how what a good implementation would be and where in the code do you think this support would need to be added. As far as the literal strings that should be in Latin-1 character set, this is easily worked around by putting the string in a buffer and converted using the PLATFORM_STRTOASC() macro (currently in each PlatformSpecificXXXX.hpp file). For ASCII-based systems, these macros are identity macros. In addition, if data in a buffer is known to be in the latin-1 character set and needs to be converted to the character set of the process, PLATFORM_ASCTOSTR() can be used. Again, for ASCII-based systems, these macros are identity macros. I plan on doing this as a first stage, which should be a benign change. What are your thoughts?

Re: platform support: internationalization and EBCDIC vs ASCII

Reply via email to