Re: [PATCH] IBM z/OS + EBCDIC support

Daniel Richard G. Mon, 04 May 2015 20:40:09 -0700

On Fri, 2015 May  1 16:26+0000, Thorsten Glaser wrote:
> 
> [ set the compiler charset option ]
> >themselves anyway. A Build.sh option just seems like overkill to me.
>
> Right, but the idea is that *if* we need the selected charset in
> mksh anyway, it’s easier and cheaper to do it like that. If not,
> then, sure.


I don't think there would be any instance where the code needs to know
"I am building for EBCDIC 1047" vs. "for EBCDIC 037" or whatnot. The
transcoding of char/string literals should be all that's needed.

(I think the compiler throws an error if some literal character cannot
be transcoded, so we would not need to worry about EBCDIC variants that
lack basic ASCII punctuation or lowercase letters.)

> >Couldn't you just keep the backslash-newlines in the *.gen files? Are
>
> Several shells’ read -r option is broken, so if I use that, I get the
> exact same problem for a different set of ancient shells ☹
>
> I could write it all on one line in the source tree, but that’s like
> giving up. I convert them for the releases already anywa…………………but I
> feel like stupid right now.
>
> The files have only one user each, so I can just pull the macros out
> of them (I think – without looking). Duh!

The multi-line macros don't appear to have any build-time variable
parts... I think moving them to normal source files would be a sensible
solution. Avoid the problem altogether!

> [ host C tool ]
> >It's possible some chtag(1) tagging might be needed, as encodings
> >could potentially get mixed up in certain instances. (This was the
> >case with a -qascii Bash build)
>
> Sure, we’ll see about this when it becomes current. I assume that’s
> just a line “if OS/390 then chtag” more in Build.sh.

Exactly.

> >Is it convention to name the binaries differently for nonstandard
> >variants? (E.g. the native Win32 port would also have modified
> >names?)
>
> I wish for it to be convention, so they don’t accidentally get
> used when
>       #!/usr/bin/env mksh
> is a script’s shebang line. Granted, you can possibly check
> $KSH_VERSION but tbh that’s like enabling UTF-8 mode for scripts by
> default if the current locale is Unicode: too many scripts (the
> majority) implicitly assume LC_ALL=C and don’t set that. So no, I
> prefer to not put this burden on the script writers.

I suppose you could leave it up to the user to create e.g. a mksh ->
zmksh symlink.

> >believe the only case where this could become an issue is when you
> >have mismatched code pages (e.g. EBCDIC 1047 mksh + EBCDIC 037 user),
> >and then you pray that as many code points agree as possible. This,
> >IMO, falls squarely in the category of "user caveat."
>
> So we assume EBCDIC 1047 mksh + EBCDIC 037 user is allowed to fail,
> and we only really have to support the code page used at compilation.

Yes, exactly. To do otherwise would be way too much work, for a platform
with too few users (short of the possibilities opened up by internal
Unicode representation).

> >This situation could change, however, once mksh is doing UTF-16
> >internally. Then, because it has to translate everything to and from
> >the outside world anyway, I see no reason why it couldn't use a 1047
> >table for user A, and a 037 table for user B. Perhaps even straight
> >UTF-8 for
>
> I think there’s two things speaking out against this:
> • the compiler transcodes the strings and chars already anyway,  and
> we rely on that too much

If all of mksh's input/output is being filtered via conversion tables
to/from UTF-16, then a straight ASCII build could support EBCDIC. Heck,
you could configure mksh on BSD/Linux to talk EBCDIC if you like! (It
wouldn't be very useful, but it would be a nifty proof of concept. Your
main concern there would be avoiding "ASCII leaks"---instances of ASCII
text being written to the terminal without going through the conversion
routines.)

> • there is a speed and simplicity advantage of having only one
> charset
> Weak reasons, but as this is already very tricky should be kept
> in mind.

If you're filtering everything through conversion tables anyway, then
using table A versus table B should have little impact on performance.
As for simplicity, well, I'd say that horse has left the UTF-16 barn :]

> >I'm not sure about "Z/OS MKSH", however, if the -qascii build would
> >have "MIRBSD MKSH". Both are z/OS, after all, and the only thing
> >significantly different about the EBCDIC build is, well, EBCDIC.
>
> OK. So how about “EBCDIC MKSH” for zmksh, keeping “MIRBSD KSH” for
> mksh (historic reasons, I’d use MKSH there nowadays).

I think that sounds right. Maybe call the binary "emksh"? As much as
IBM's marketing uses z/This and z/That, you don't see it a whole lot
inside the actual environment...

> >(Couldn't get uhr to work with R50 on my Debian system, however...
> >lots of "no coprocess" errors...)
> 
> Huh.
> 
> tg@tglase-eee:~ $ zcat /usr/share/doc/mksh/examples/uhr.gz | mksh
> 
> This works OOTB for me. But you do have to install bc(1) first;
> unlike real Unix systems, absolutely-basic-should-be-everywhere
> tools like bc, ed, uudecode are not installed by default on GNU.

Ah, that was it---bc was not installed.

Very nice hack! I do prefer analog clocks myself.

> Now go get some sleep ;-)

Still working on it...


--Daniel


-- 
Daniel Richard G. || [email protected]
My ASCII-art .sig got a bad case of Times New Roman.

Re: [PATCH] IBM z/OS + EBCDIC support

Reply via email to