Hi Nadir,

whatever happened to this? Did we get any conclusions?

John Hawkins




Nadir Amra <[EMAIL PROTECTED]>

22/12/2004 08:37

Please respond to
"Apache AXIS C Developers List"

To
"'Apache AXIS C Developers List'" <[email protected]>
cc
Subject
platform support: internationalization and EBCDIC vs ASCII





Correct me if I am wrong....and sorry for the long note but it is
necessary.

The AXIS code has a restriction that the locale of the process must be
UTF-8 assumes everything is in UTF-8.  Thus the code works specifically in
processes where the locale is set to UTF-8 or to a single byte ASCII
character set such as the Latin-1 locales, since the character set is a
subset of UTF-8).  For those locales that are not single byte or UTF-8,
code does not work so well.  Obviously the code does not work on
EBCDIC-based systems such as OS/400.

I need this restriction removed in version 1.5.

To remove the restriction, the code needs to be sensitive to the locale of
the process that the client is running in and assume any data received
from the client that is to be passed to a web service is in the character
set of the locale, and thus needs to be converted to UTF-8.  Similarly,
any data received from the web service needs to be converted to the
character set of the running process, since the various C-runtime string
functions are dependent on the locale of the process in order for the
functions to work properly.

The XML parsers can handle the data coming in from the Web service no
matter what the encoding, and there is no problem on that side of things.  
I am assuming the data obtained by the XML parser is being transcoded to
UTF-8.

In addition, there are hard-code literal strings that is assumed to be in
ASCII.  This would also need to be changed.

I plan spending a lot of time in the next 4 weeks to get the
infrastructure built into the code to allow the code to run on OS/400.
Hopefully, the work I put in can easily be extended to other platforms so
that if someone wanted to run in a Japanese locale, it would work with
minor changes.

My thoughts are that a user can indicate whether transcoding should be
enabled via a configuration property in the property file.  When that
happens, the code will create transcoders to convert data from the locale
of the process to UTF-8 and from UTF-8 to the locale of the process.  I
still have to investigate if it is possible to use the XML parser
transcoders, or even if that is possible.  I am looking for direction from
you all to see how what a good implementation would be and where in the
code do you think this support would need to be added.

As far as the literal strings that should be in Latin-1 character set,
this is easily worked around by putting the string in a buffer and
converted using the PLATFORM_STRTOASC() macro (currently in each
PlatformSpecificXXXX.hpp file).  For ASCII-based systems, these macros are
identity macros.  In addition, if data in a buffer is known to be in the
latin-1 character set and needs to be converted to the character set of
the process, PLATFORM_ASCTOSTR() can be used.  Again, for ASCII-based
systems,  these macros are identity macros.  I plan on doing this as a
first stage, which should be a benign change.

What are your thoughts?


Reply via email to