Not exactly the question you asked, but IMHO if one were writing a "system"
(OS, DBMS, application family) today one would be foolish to restrict one's
customers to 95 or so printable characters. You would be (1) writing off all of
Asia and (2) condemning much of Europe and northern Africa to either second
class status, or the constant code page shuffle ("what character is x'80'?
Well, it depends where you are.")
Other than the above you have three choices:
- UTF-8, which will represent every character in the world, is almost as
compact as ASCII, and can be treated as ASCII for quick-and-dirty purposes like
debugging displays. What you give up is the comforting knowledge that
characters are always, always, always one to one with bytes.
- UTF-32. Like UTF-8, but you gain a fixed relationship between characters and
bytes (1:4) at a cost in storage. You might counter that storage is cheap these
days.
- I am not a Windows-basher, but I think Windows' choice of UTF-16 is the worst
of both worlds. It consumes twice the storage of ASCII, with the tradeoff that
you can almost, almost, almost count on a fixed relationship between characters
and bytes (1:2). The problem is that you cannot quite count on it -- some
characters are 32 bits -- and if you have supported code that is running out in
the field you know that code that works 99.9% of the time is much more
problematic than code that works 95% of the time (as would a routine that
assumed UTF-8 was 1:1 with bytes).
Most Web pages, the Go Language, and Db2 (I am told) all use UTF-8 internally.
Charles
-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf
Of Rupert Reynolds
Sent: Thursday, August 20, 2020 5:55 AM
To: [email protected]
Subject: EBCDIC and other systems
I'm writing a new OS for PC hardware (an exercise started during
lockdown/furlough) and I wondered about files from other systems. Is there
much in DBCS on mainframe systems these days, or is it still mainly the
same old 8-bit EBCDIC, please?
I still have to decide whether to support UTF-8 and/or UTF-32, of course :-)
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN