Re: UTF-8 woes on z/OS, a solution - comments invited

Timothy Sipples Tue, 05 Sep 2017 07:31:30 -0700

Paul Gilmartin wrote:
>Why is there UTF-16?
>[....]
>o It lacks the compactness of UTF-8 in the case of Latin text.

Japanese Kanji, Traditional Chinese, Simplified Chinese, and emoji (!), as
examples, are not Latin text. More than 1.5 billion people is a lot of
people, and that's not counting all the billions of emoji users. :-)

And who cares about this compactness, really? Bytes are no longer *that*
precious, especially when they're compressed anyway.

>(What does Java use internally?)

UTF-16, as it happens.

FYI, if DB2 for z/OS is in the loop then DB2 will convert UTF-8 to UTF-16
for your PL/I application(s). Just store the UTF-8 data in DB2, use the
WIDECHAR datatype, and it all happens automagically, effortlessly, with no
UTF-8 to UTF-16 programming required. See here for more information:

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/char/src/tpc/db2z_processunidatapli.html

If for some odd reason you absolutely insist on an EBCDIC-ish approach then
you can do what the Japanese have done for decades: Shift Out (SO), Shift
In (SI). Refer to CCSID 930 and CCSID 1390 for inspiration. You'd probably
use one of the EBCDIC Latin 1+euro codepages as a starting point, such as
1140, then SO/SI from there to pick up the exceptional characters.

--------------------------------------------------------------------------------------------------------
Timothy Sipples
IT Architect Executive, Industry Solutions, IBM z Systems, AP/GCG/MEA
E-Mail: [email protected]

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: UTF-8 woes on z/OS, a solution - comments invited

Reply via email to