Charles: Good points well made. Yes, I agree that UTF-16 offers no
advantage to me. UTF-32 has to be considered for performance in string
handling functions. I may end up defaulting to UTF-8 on disc, and
converting to the others when needed.

The system's source and compiler (crude but working) are all written in
7-bit ASCII to keep things simple, but data can be any value--I'm not a big
fan of stringz :-)

To be frank, I doubt it will have more than 1 user, but I won't be happy
until I can write and print my CV on it, so I might as well make some
sensible decisions now :-)

Thank you.

Rupert

On Thu., Aug. 20, 2020, 15:17 Charles Mills, <[email protected]> wrote:

> Not exactly the question you asked, but IMHO if one were writing a
> "system" (OS, DBMS, application family) today one would be foolish to
> restrict one's customers to 95 or so printable characters. You would be (1)
> writing off all of Asia and (2) condemning much of Europe and northern
> Africa to either second class status, or the constant code page shuffle
> ("what character is x'80'? Well, it depends where you are.")
>
> Other than the above you have three choices:
>
> - UTF-8, which will represent every character in the world, is almost as
> compact as ASCII, and can be treated as ASCII for quick-and-dirty purposes
> like debugging displays. What you give up is the comforting knowledge that
> characters are always, always, always one to one with bytes.
>
> - UTF-32. Like UTF-8, but you gain a fixed relationship between characters
> and bytes (1:4) at a cost in storage. You might counter that storage is
> cheap these days.
>
> - I am not a Windows-basher, but I think Windows' choice of UTF-16 is the
> worst of both worlds. It consumes twice the storage of ASCII, with the
> tradeoff that you can almost, almost, almost count on a fixed relationship
> between characters and bytes (1:2). The problem is that you cannot quite
> count on it -- some characters are 32 bits -- and if you have supported
> code that is running out in the field you know that code that works 99.9%
> of the time is much more problematic than code that works 95% of the time
> (as would a routine that assumed UTF-8 was 1:1 with bytes).
>
> Most Web pages, the Go Language, and Db2 (I am told) all use UTF-8
> internally.
>
> Charles
>
>
> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of Rupert Reynolds
> Sent: Thursday, August 20, 2020 5:55 AM
> To: [email protected]
> Subject: EBCDIC and other systems
>
> I'm writing a new OS for PC hardware (an exercise started during
> lockdown/furlough) and I wondered about files from other systems. Is there
> much in DBCS on mainframe systems these days, or is it still mainly the
> same old 8-bit EBCDIC, please?
>
> I still have to decide whether to support UTF-8 and/or UTF-32, of course
> :-)
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN
>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to