Hi, I binary ftp'ed these xmls from solaris to windows and tried to open with unicode editors / browsers with the same garbage output.
I think this issue could be linked to the "https://issues.apache.org/jira/browse/XERCESC-1305" (even though the issue talks about iso-8859-1) I checked the encoding by nl_langinfo(CODESET) and found it to be 5601 (which I think must be KSC5601-1987) Am not sure whether the default transcoder (iconv) for solaris handles this. (tried doing makeNewTranscoderFor("KSC5601", failReason, 16*1024)) So I rebuild xerces using ICU transcoder and hey XMLString::transcode is working fine!! (even with the setlocale call !!!) Still wondering whether and how the iconv transcoder will be able to do it? Regards, Pushkar -----Original Message----- From: Patil, Pushkar Sent: Thursday, May 29, 2008 4:22 PM To: '[email protected]' Subject: RE: Unable to transcode korean chars on Solaris Hi Alberto, >> You are writing the XML file using the default encoding (i.e. UTF-8), >> but using a "cat" command to display it; try instead loading it in a >> Unicode-enabled editor (emacs, maybe) You are right about this. Unfortunately I don't have admin access to install unicode editors. So I wrote a parser for the same xml (that I write) and it gives me the correct output. So as far as the writing of korean chars in xml is concerned its proper. So basically my sample code is workin with just removal of the setlocale call. But unfortunately in our actual code we need to use string conversion functions like wcstombs and mbstowcs that don't work without setlocale. Hence am back into a soup :-( Any suggestion on this? >> try setting your shell locale to just UTF-8 Am not sure how to do that; still working on it. Regards, Pushkar -----Original Message----- From: Alberto Massari [mailto:[EMAIL PROTECTED] Sent: Wednesday, May 28, 2008 5:59 PM To: [email protected] Subject: Re: Unable to transcode korean chars on Solaris Patil, Pushkar wrote: > Hi Alberto, > > You were right I was assuming it to be UTF-8 (I thought that was the > default char map for "ko" locale) Anyways now I set the shell locale > to "ko.UTF-8" and removed the setlocale call in the code. > So now I am sure that the input is in UTF-8. > > Using XMLString::transcode I am able to converts char* to XMLCh* and > XMLCh* to char* without any loss. > But the same string if I write in xml and view the xml, the korean > string in xml is different!! > Am I missing something while writing the xml? > You are writing the XML file using the default encoding (i.e. UTF-8), but using a "cat" command to display it; try instead loading it in a Unicode-enabled editor (emacs, maybe) > On secondary level, even though my shell locale is UTF-8 now, the > UTF-8 transcoder is still distorting the output. > Assuming that ko.UTF-8 is a real UTF-8 encoding could be a wrong assumption (UTF-8 doesn't need to specify a locale like "ko", as it is locale-independent, so it could well be that ko.UTF-8 is still a Korean locale using UTF-8 instead of EUC as internal representation); try setting your shell locale to just UTF-8, and verify that all the parts of your code (XMLString::transcode, "cat", and UTF-8 transcoder) all works. Alberto > Regards, > Pushkar > > -----Original Message----- > From: Alberto Massari [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 28, 2008 1:53 PM > To: [email protected] > Subject: Re: Unable to transcode korean chars on Solaris > > Hi Pushkar, > relying on XMLString::transcode depends on the current locale; so I > wouldn't do the call to setlocale if you know that the input string > was entered using the current shell locale. As for the other attempt, > you are creating an UTF-8 transcoder and asking it to convert the > input string, but this would only work if your shell locale is UTF-8. > So, either work with whatever locale is used by the shell > (XMLString::transcode) or create the appropriate transcoder for the > input string you are dealing with (don't blindly use UTF-8). > > Alberto > > > Patil, Pushkar wrote: > >> Hi All, >> >> I am facing problem while transcoding korean chars on Solaris. >> Some details: >> Xerces Version: 2.2 >> Solaris: 5.8 >> Locale: Korean >> The code works fine on AIX and Windows ( for both en_US and korean >> locale ) >> >> I receive korean data as multie byte char* from database and to >> transcode I used the "XMLString::transcode" method. >> When I write the transcoded XMLCh* in xml, the string is distorted. >> I tried using XMLTranscoder with no results. >> >> To debug the problem I have written a small C style program >> (OnlyXerces.cpp) which simulates the output (it receives the korean >> chars as argument). >> I have attached the program, the console output from the program and >> the data from the generated xmls. >> >> Would be great if someone would point out the problem in my code or >> divert me to a alternative / better approach. >> >> Regards, >> Pushkar >> >> Snippets of code: >> >> *setlocale(LC_ALL, ""); *// output is received as "ko" >> >> *////////// Transcoding using XMLString:transcode////////// * >> char* strIn = argv[1]; /// argv[1] contains the input korean >> characters >> *XMLCh* tag = XMLString::transcode(strIn);* ...write xml using "tag" >> >> *////////// Transcoding using XMLTranscoder* >> XMLRecognizer::Encodings >> > > >> encodingEnum = XMLRecognizer::UTF_8; >> *XMLTranscoder* utf8Transcoder = >> XMLPlatformUtils::fgTransService->makeNewTranscoderFor(encodingEnum, >> failReason, 16*1024); * >> >> XMLCh* outputStr = NULL; >> unsigned int charsEaten = 0; >> unsigned int length = strlen(strIn); unsigned char* sizes = new >> unsigned char[ length + 1 ]; outputStr = new XMLCh[ length ]; >> * unsigned int chars_stored = utf8Transcoder->transcodeFrom((const >> XMLByte*) strIn, length, outputStr, length, charsEaten, sizes ); * ... >> > > >> write xml using "outputStr" >> >> >> > > >
