Paul Gilmartin wrote: >Why is there UTF-16? >[....] >o It lacks the compactness of UTF-8 in the case of Latin text.
Japanese Kanji, Traditional Chinese, Simplified Chinese, and emoji (!), as examples, are not Latin text. More than 1.5 billion people is a lot of people, and that's not counting all the billions of emoji users. :-) And who cares about this compactness, really? Bytes are no longer *that* precious, especially when they're compressed anyway. >(What does Java use internally?) UTF-16, as it happens. FYI, if DB2 for z/OS is in the loop then DB2 will convert UTF-8 to UTF-16 for your PL/I application(s). Just store the UTF-8 data in DB2, use the WIDECHAR datatype, and it all happens automagically, effortlessly, with no UTF-8 to UTF-16 programming required. See here for more information: https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/char/src/tpc/db2z_processunidatapli.html If for some odd reason you absolutely insist on an EBCDIC-ish approach then you can do what the Japanese have done for decades: Shift Out (SO), Shift In (SI). Refer to CCSID 930 and CCSID 1390 for inspiration. You'd probably use one of the EBCDIC Latin 1+euro codepages as a starting point, such as 1140, then SO/SI from there to pick up the exceptional characters. -------------------------------------------------------------------------------------------------------- Timothy Sipples IT Architect Executive, Industry Solutions, IBM z Systems, AP/GCG/MEA E-Mail: [email protected] ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
