Re: \uXXXX on EBCDIC systems (was Re: [PATCH] IBM z/OS + EBCDIC support)

Daniel Richard G. Wed, 03 May 2017 15:56:35 -0700

Hi Thorsten,

On Wed, 2017 May  3 15:57+0000, Thorsten Glaser wrote:
> Dixi quod…
> 
> >Use U+4DC0 HEXAGRAM FOR THE CREATIVE HEAVEN (䷀) then ☺
> 
> I *do* have a follow-up question for that now.
> 
> The utf8bug-1 test fails because its output is interpreted as UTF-8,
> but the UTF-8 string it should match was treated as “extended ASCII”
> and is thus converted…
> 
> So, the situation as it is right now is:
> 
> print -n '0\u4DC0' outputs the following octets:
> - on an ASCII system : 30 E4 B7 80
> - on an EBCDIC system: F0 E4 B7 80
> 
> That is, “0” is output in the native codepage, and the Unicode
> value is output as real UTF-8 octets.


This kind of weirdness is but one reason why z/Linux (Linux on z/OS) is
eating Unix System Services alive :]

> Now you say UTF-8 is not really used on z/OS or EBCDIC systems
> in general, so I was considering the following heresy:
> - output: F0 43 B3 20
> 
> That is, convert UTF-8 output, before actually outputting it,
> as if it were “extended ASCII”, to EBCDIC.
> 
> Converting F0 43 B3 20 from EBCDIC(1047) to “extended ASCII”
> yields 30 E4 B7 80 by the way, see above. (Typos in the manual
> conversion notwithstanding.)
> 
> This would allow more consistency doing all those conversions
> (which are done automatically). If it doesn’t diminish the
> usefulness of mksh on EBCDIC systems I’d say go for it.
> 
> Comments?

While UTF-8 isn't a thing in the z/OS environment, I think there could
be value in printing something that will be converted by the existing
EBCDIC->ASCII terminal/NFS conversion into correctly-formed UTF-8
characters.

To wit: Say I have a UTF-8-encoded file in NFS, and I view it via a
text-mode NFS mount on z/OS. If I view it in less(1), then the high
characters are shown as arbitrary byte sequences (e.g. "DIVISION SIGN"
is "<66><B3>"). But if I just "cat" the file, then it renders correctly
in the terminal. Effectively an ASCII->EBCDIC->ASCII round trip.

I don't know if there are use cases where this may yield unintuitive
results... perhaps if this "nega-UTF-8" were redirected to a file and
then processed further in z/OS, that may lead to some surprises. But in
terms of doing something sensible when using a "\uNNNN" escape in an
environment that shouldn't support it, it seems no worse than producing
actual UTF-8 bytes.


--Daniel


-- 
Daniel Richard G. || sk...@iskunk.org
My ASCII-art .sig got a bad case of Times New Roman.

Re: \uXXXX on EBCDIC systems (was Re: [PATCH] IBM z/OS + EBCDIC support)

Reply via email to