Re: EBCDIC and other systems

Charles Mills Thu, 20 Aug 2020 07:17:55 -0700

Not exactly the question you asked, but IMHO if one were writing a "system" 
(OS, DBMS, application family) today one would be foolish to restrict one's 
customers to 95 or so printable characters. You would be (1) writing off all of 
Asia and (2) condemning much of Europe and northern Africa to either second 
class status, or the constant code page shuffle ("what character is x'80'? 
Well, it depends where you are.")


Other than the above you have three choices:

- UTF-8, which will represent every character in the world, is almost as 
compact as ASCII, and can be treated as ASCII for quick-and-dirty purposes like 
debugging displays. What you give up is the comforting knowledge that 
characters are always, always, always one to one with bytes.

- UTF-32. Like UTF-8, but you gain a fixed relationship between characters and 
bytes (1:4) at a cost in storage. You might counter that storage is cheap these 
days.

- I am not a Windows-basher, but I think Windows' choice of UTF-16 is the worst 
of both worlds. It consumes twice the storage of ASCII, with the tradeoff that 
you can almost, almost, almost count on a fixed relationship between characters 
and bytes (1:2). The problem is that you cannot quite count on it -- some 
characters are 32 bits -- and if you have supported code that is running out in 
the field you know that code that works 99.9% of the time is much more 
problematic than code that works 95% of the time (as would a routine that 
assumed UTF-8 was 1:1 with bytes).

Most Web pages, the Go Language, and Db2 (I am told) all use UTF-8 internally. 

Charles


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On Behalf 
Of Rupert Reynolds
Sent: Thursday, August 20, 2020 5:55 AM
To: [email protected]
Subject: EBCDIC and other systems

I'm writing a new OS for PC hardware (an exercise started during
lockdown/furlough) and I wondered about files from other systems. Is there
much in DBCS on mainframe systems these days, or is it still mainly the
same old 8-bit EBCDIC, please?

I still have to decide whether to support UTF-8 and/or UTF-32, of course :-)

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: EBCDIC and other systems

Reply via email to