Re: [CMS-PIPELINES] CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22)

Phil Smith III Tue, 20 Nov 2012 06:23:48 -0800

You don't want a routine, you need a whole service. This is a huge topic,
and you might want to start by reading the Wikipedia page on Unicode.


You need to understand the difference between a character set (charset, code
page) and an encoding (Unicode).

ICU and ICONV could presumably be ported to CMS. Note that ICONV at least
will throw an error if it finds a character that isn't in the input
charset--this can be a nasty surprise. Haven't tested ICU.

...phsiii
-----Original Message-----
From: CMSTSO Pipelines Discussion List [mailto:[email protected]]
On Behalf Of CMS-PIPELINES automatic digest system
Sent: Tuesday, November 20, 2012 12:03 AM
To: [email protected]
Subject: CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22)

There is 1 message totalling 51 lines in this issue.

Topics of the day:

  1. XLATE (or any method) for converting UTF-8 to/from EBCDIC.

----------------------------------------------------------------------

Date:    Mon, 19 Nov 2012 18:50:11 +0000
From:    "Larson, John E." <[email protected]>
Subject: Re: XLATE (or any method) for converting UTF-8 to/from EBCDIC.

I have been searching and searching (Google, all VM documentation I can fin=
d, pipeline forum and history pages, etc.) for days and can't figure out ho=
w to deal with ASCII UTF-8.

I've read on one site that codepage 1207/1208 "might" be the way, but of co=
urse I can't find any more specifics about these codepages, and I'm not so =
interested in them anyway as they're not supported by XLATE.

My requirement seems simple enough, update a CMS XML browser to display ASC=
II UTF-8 data in displayable EBCDIC, translate back to UTF-8 before saving =
to disk or sending the data to a TPF system.

For example, an input message may contain an extended Latin vowel, say, x'5=
1' from EBCDIC codepage 1047, and the UTF-8 equivalent of this is actually =
a two-byte value of x'C3AA'.

I have been doing just fine using the standard XLATE A2E and E2A until this=
 new requirement to support the Latin characters (accented vowels and conso=
nants).

Is there no other way to do this than write my own translate table?  Even t=
hat is not so straight-forward, as the characters are not all a byte for by=
te substitution.

After days of searching, I can't think of any other way than to write a pur=
e rexx routine that loops through the entire string, substituting some byte=
s for a different byte, and "some" bytes to a two-byte substitution.

And of course I have to go both ways.

What makes this really unappealing is that I am dealing with a message driv=
er that sends tens of thousands of messages a second (1K-5K bytes in length=
 for each message), and I can't help but feel that taking the time for a re=
xx routine to do this translation is going to noticeable slow things down.

I'm really surprised that with all the Internet UFT-8 usage "out there" tha=
t there isn't a way to do this with a "built-in" routine.

Anyone else have to deal with UTF-8 to EBCDIC and back?

John

------------------------------

End of CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22)
******************************************************************

Re: [CMS-PIPELINES] CMS-PIPELINES Digest - 1 Nov 2012 to 19 Nov 2012 (#2012-22)

Reply via email to