Patil, Pushkar wrote:
Hi Alberto,
You were right I was assuming it to be UTF-8 (I thought that was the
default char map for "ko" locale)
Anyways now I set the shell locale to "ko.UTF-8" and removed the
setlocale call in the code.
So now I am sure that the input is in UTF-8.
Using XMLString::transcode I am able to converts char* to XMLCh* and
XMLCh* to char* without any loss.
But the same string if I write in xml and view the xml, the korean
string in xml is different!!
Am I missing something while writing the xml?
You are writing the XML file using the default encoding (i.e. UTF-8),
but using a "cat" command to display it; try instead loading it in a
Unicode-enabled editor (emacs, maybe)
On secondary level, even though my shell locale is UTF-8 now, the UTF-8
transcoder is still distorting the output.
Assuming that ko.UTF-8 is a real UTF-8 encoding could be a wrong
assumption (UTF-8 doesn't need to specify a locale like "ko", as it is
locale-independent, so it could well be that ko.UTF-8 is still a Korean
locale using UTF-8 instead of EUC as internal representation); try
setting your shell locale to just UTF-8, and verify that all the parts
of your code (XMLString::transcode, "cat", and UTF-8 transcoder) all works.
Alberto
Regards,
Pushkar
-----Original Message-----
From: Alberto Massari [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 28, 2008 1:53 PM
To: [email protected]
Subject: Re: Unable to transcode korean chars on Solaris
Hi Pushkar,
relying on XMLString::transcode depends on the current locale; so I
wouldn't do the call to setlocale if you know that the input string was
entered using the current shell locale. As for the other attempt, you
are creating an UTF-8 transcoder and asking it to convert the input
string, but this would only work if your shell locale is UTF-8.
So, either work with whatever locale is used by the shell
(XMLString::transcode) or create the appropriate transcoder for the
input string you are dealing with (don't blindly use UTF-8).
Alberto
Patil, Pushkar wrote:
Hi All,
I am facing problem while transcoding korean chars on Solaris.
Some details:
Xerces Version: 2.2
Solaris: 5.8
Locale: Korean
The code works fine on AIX and Windows ( for both en_US and korean
locale )
I receive korean data as multie byte char* from database and to
transcode I used the "XMLString::transcode" method.
When I write the transcoded XMLCh* in xml, the string is distorted.
I tried using XMLTranscoder with no results.
To debug the problem I have written a small C style program
(OnlyXerces.cpp) which simulates the output (it receives the korean
chars as argument).
I have attached the program, the console output from the program and
the data from the generated xmls.
Would be great if someone would point out the problem in my code or
divert me to a alternative / better approach.
Regards,
Pushkar
Snippets of code:
*setlocale(LC_ALL, ""); *// output is received as "ko"
*////////// Transcoding using XMLString:transcode////////// *
char* strIn = argv[1]; /// argv[1] contains the input korean
characters
*XMLCh* tag = XMLString::transcode(strIn);* ...write xml using "tag"
*////////// Transcoding using XMLTranscoder* XMLRecognizer::Encodings
encodingEnum = XMLRecognizer::UTF_8;
*XMLTranscoder* utf8Transcoder =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor(encodingEnum,
failReason, 16*1024); *
XMLCh* outputStr = NULL;
unsigned int charsEaten = 0;
unsigned int length = strlen(strIn);
unsigned char* sizes = new unsigned char[ length + 1 ]; outputStr =
new XMLCh[ length ];
* unsigned int chars_stored = utf8Transcoder->transcodeFrom((const
XMLByte*) strIn, length, outputStr, length, charsEaten, sizes ); * ...
write xml using "outputStr"