Hi Thorsten, On Wed, 2017 May 3 15:57+0000, Thorsten Glaser wrote: > Dixi quod… > > >Use U+4DC0 HEXAGRAM FOR THE CREATIVE HEAVEN (䷀) then ☺ > > I *do* have a follow-up question for that now. > > The utf8bug-1 test fails because its output is interpreted as UTF-8, > but the UTF-8 string it should match was treated as “extended ASCII” > and is thus converted… > > So, the situation as it is right now is: > > print -n '0\u4DC0' outputs the following octets: > - on an ASCII system : 30 E4 B7 80 > - on an EBCDIC system: F0 E4 B7 80 > > That is, “0” is output in the native codepage, and the Unicode > value is output as real UTF-8 octets.
This kind of weirdness is but one reason why z/Linux (Linux on z/OS) is eating Unix System Services alive :] > Now you say UTF-8 is not really used on z/OS or EBCDIC systems > in general, so I was considering the following heresy: > - output: F0 43 B3 20 > > That is, convert UTF-8 output, before actually outputting it, > as if it were “extended ASCII”, to EBCDIC. > > Converting F0 43 B3 20 from EBCDIC(1047) to “extended ASCII” > yields 30 E4 B7 80 by the way, see above. (Typos in the manual > conversion notwithstanding.) > > This would allow more consistency doing all those conversions > (which are done automatically). If it doesn’t diminish the > usefulness of mksh on EBCDIC systems I’d say go for it. > > Comments? While UTF-8 isn't a thing in the z/OS environment, I think there could be value in printing something that will be converted by the existing EBCDIC->ASCII terminal/NFS conversion into correctly-formed UTF-8 characters. To wit: Say I have a UTF-8-encoded file in NFS, and I view it via a text-mode NFS mount on z/OS. If I view it in less(1), then the high characters are shown as arbitrary byte sequences (e.g. "DIVISION SIGN" is "<66><B3>"). But if I just "cat" the file, then it renders correctly in the terminal. Effectively an ASCII->EBCDIC->ASCII round trip. I don't know if there are use cases where this may yield unintuitive results... perhaps if this "nega-UTF-8" were redirected to a file and then processed further in z/OS, that may lead to some surprises. But in terms of doing something sensible when using a "\uNNNN" escape in an environment that shouldn't support it, it seems no worse than producing actual UTF-8 bytes. --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.