You don't want a routine, you need a whole service. This is a huge topic, and you might want to start by reading the Wikipedia page on Unicode.
You need to understand the difference between a character set (charset, code page) and an encoding (Unicode). ICU and ICONV could presumably be ported to CMS. Note that ICONV at least will throw an error if it finds a character that isn't in the input charset--this can be a nasty surprise. Haven't tested ICU. ...phsiii -----Original Message----- From: CMSTSO Pipelines Discussion List [mailto:[email protected]] On Behalf Of CMS-PIPELINES automatic digest system Sent: Tuesday, November 20, 2012 12:03 AM To: [email protected] Subject: CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22) There is 1 message totalling 51 lines in this issue. Topics of the day: 1. XLATE (or any method) for converting UTF-8 to/from EBCDIC. ---------------------------------------------------------------------- Date: Mon, 19 Nov 2012 18:50:11 +0000 From: "Larson, John E." <[email protected]> Subject: Re: XLATE (or any method) for converting UTF-8 to/from EBCDIC. I have been searching and searching (Google, all VM documentation I can fin= d, pipeline forum and history pages, etc.) for days and can't figure out ho= w to deal with ASCII UTF-8. I've read on one site that codepage 1207/1208 "might" be the way, but of co= urse I can't find any more specifics about these codepages, and I'm not so = interested in them anyway as they're not supported by XLATE. My requirement seems simple enough, update a CMS XML browser to display ASC= II UTF-8 data in displayable EBCDIC, translate back to UTF-8 before saving = to disk or sending the data to a TPF system. For example, an input message may contain an extended Latin vowel, say, x'5= 1' from EBCDIC codepage 1047, and the UTF-8 equivalent of this is actually = a two-byte value of x'C3AA'. I have been doing just fine using the standard XLATE A2E and E2A until this= new requirement to support the Latin characters (accented vowels and conso= nants). Is there no other way to do this than write my own translate table? Even t= hat is not so straight-forward, as the characters are not all a byte for by= te substitution. After days of searching, I can't think of any other way than to write a pur= e rexx routine that loops through the entire string, substituting some byte= s for a different byte, and "some" bytes to a two-byte substitution. And of course I have to go both ways. What makes this really unappealing is that I am dealing with a message driv= er that sends tens of thousands of messages a second (1K-5K bytes in length= for each message), and I can't help but feel that taking the time for a re= xx routine to do this translation is going to noticeable slow things down. I'm really surprised that with all the Internet UFT-8 usage "out there" tha= t there isn't a way to do this with a "built-in" routine. Anyone else have to deal with UTF-8 to EBCDIC and back? John ------------------------------ End of CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22) ******************************************************************
