Re: [PATCH] IBM z/OS + EBCDIC support

Thorsten Glaser Mon, 27 Apr 2015 15:15:08 -0700

Hah!

Hi again.


Your eMail requires at least three passes…

① reading through all of it, taking notes
② this answer message, with a few comments on some things,
  while ignoring some other things altogether
③ another answer message tackling those things, after I
  ponder this some more (it *is* a brave new world you opened!)

The result of #1+#2 follows.


Daniel Richard G. dixit:

>To return to your question, while conversion with iconv(1) is available,
>you can see it's far from the most convenient approach.

OK. Conversion with something like it, then, anyway.

>> - stuff like print/printf \x4F is expected to output '|' not 'O'
>
>Yep! Just tried it in the shell:
>
>    $ printf '\x4F\n'
>    |

OK. Just what I thought.

>> - what about \u20AC? UTF-8? UTF-EBCDIC?
>
>Many code pages have received euro-sign updates; e.g. EBCDIC 924 is

I wasn’t actually asking about Euro support here, but deeper…

>Locale support in z/OS is like it was in Linux over a decade ago: If
>you're a U.S. user, use the default code page; if you're a Russian user,
>use a Russian code page, and so on... and all code pages are 8 bits.

… *shudder* (and OK @ not using UTF-EBCDIC)…

>> - keyboard input is in EBCDIC?
>
>I worked by way of SSH'ing in to the z/OS OMVS Unix environment.
>Everything in OMVS is EBCDIC, but of course my SSH client sends and
>receives everything in ASCII. There is a network translation layer in
>between, apart from the file-content conversion layers previously
>mentioned, that makes it all work transparently.

… *UGH!* That’s the hard thing.

Actually, does your SSH client send/receive in ASCII, or in latin1
or some other ASCII-based codepage? What does this layer use?

Though, that is almost certainly irrelevant for mksh, I see from #1.

>A "real" mainframe connection, however, would be through TN3270, using
>the x3270 program or the like. Then the conversion is happening on the
>client side. But this is not relevant to mksh, because you don't get the

OK.

>> - is there anything that allows Unicode input?
>
>>From the keyboard? I've not seen anything suggesting this is possible.
>Even IBM's z/OS Unicode support via UTF-16 is, as far as I can tell, for
>use by applications and not by logged-in users.

OK.

This would mean completely removing utf8-mode from the shell.
That’s a more deep incision than I originally thought would
be required.

>My understanding of why things like locale/encoding support on the
>console/terminal aren't up to snuff on z/OS is that this would only
>benefit the crusty mainframe operators, who are comparatively small in
>number compared to the user base of the application(s) running on the
>system. At the same time, there is z/Linux (Linux on the mainframe), and

I see.

z/Linux is “something like Debian/s390 and Debian/s390x”, then?
(In that case: mksh works perfectly well there.)

>return ASCII. You end up with an ASCII application, basically, even
>though the source and environment aren't.)

That makes this pretty useless for us… except (see below).

>I, too, take portability seriously :)

Glad to see! ;)

>> • host OS (machine running Build.sh): ASCII-based or EBCDIC?
>
>Perhaps the "printf '\x4F'" thing can be used to detect an EBCDIC build

No, printf is unportable, but maybe echo something | tr a-z A-Z,
which should differ. Though I recall at least one system not
supporting ranges in tr, so this is more like a “check if the
output is expected for tr on EBCDIC that does support ranges,
and everything else is ASCI” thing, I guess.

>> • target OS (machine running the mksh/lksh binary created):
>>   z/OS ASCII, z/OS EBCDIC, or anything else?
>
>There is also the matter of the EBCDIC variant. Of the EBCDIC code
>pages that contain all of ASCII, the characters are generally
>assigned consistently to the same codepoints. But one exception
>occurs between EBCDICs 1047 and 037, which assign '[', ']', and '^'
>differently---characters that are significant to the shell.
>
>(EBCDIC 037 is likely to be the second-most-popular code page after
>1047, and is in fact the x3270 default.)

Yeowch!

>I don't think it's feasible to have a single mksh binary support
>multiple EBCDIC variants, however, so IMO this matter is best left to
>the user's discretion in what CFLAGS they provide (-qconvlit option). As
>long as the code specifies these characters literally instead of
>numerically, everything should fall in line.

… sounds like a maintenance nightmare. But probably doable,
if we enumerate the set of options (to a carefully chosen,
small number).

>The Build.sh code wouldn't be able to suss out the signals any better if
>it knew about these that are unique to z/OS? IBM might add even more
>signals down the line, after all...

I don’t think so, at least NSIG should be precise, especially
if at least one of sys_siglist, sys_signame and strsignal exists.

You could experiment things at runtime. Just kill(2) something
with all numbers, see if high numbers give different errors,
maybe the OS says “signal number too high”, then we get a clue.

>I do have access to modern AIX systems with xlc. Ohhh, I'm seeing some
>nastiness there:
>
>    "rlimits.gen", line 20.20: 1506-191 (E) The character # is not a
>    valid C source character.

May be related to a bug in the shell running Build.sh.
Try removing all backlash+newline from the *.opt files first,
as I do in the mksh-R*.tgz distfiles. (read without -r is
supposed to do that, but apparently, enough shells fail it;
mid- to long-term, there is no way around compiling a C tool
on the host to generate some things for mksh ☹)

>I can't say whether that's the case in the TN3270 environment, but if
>I'm connected via SSH, then my terminal is just a standard TERM=xterm

Right.

>terminal window that responds accordingly. Of course, this is going
>through the automatic EBCDIC<->ASCII conversion; not just printable
>characters but also control characters get translated.

OK.

>> So it’s possible to use ASCII on the system, but atypical?
>
>Right. In an EBCDIC environment, it's not terribly useful:
>
>    $ ./mksh -c 'echo zzz'; echo
>    :::

… catching the thread from above again… this sounds like
we’d want to completely ignore it…

>I'm looking into setting up an environment that is all-ASCII, starting
>from /bin/sh (hence why I'm here)

… except for this thing.

I’d hope ASCII mksh would work as-is on your system, but
apparently you’d need a bunch of tools working in ASCII
mode first before you can confirm that, e.g. by running
the testsuite ;)

Sounds like an interesting (positive sense) hacking goal!
You, Sir, fit right into the mksh crowd (I did the Plan 9
port, as well as others, but lewellyn and RT from IRC have
ported mksh to many funny and obscure platforms)!

>> Huh, Pascal anyone? :)
>
>I figured, Perl and Python have picked it up, why not here too? ;)

I was always more particular to ASC() and CHR$()… *ducks and hides*

>> >+++ edit.c (back to patch order)
>>
>> Here’s where we start going into Unicode land. This file is the one
>> that assumes UTF-8 the most.
>
>But that need not be active, right? I saw UTFMODE in run-time
>conditionals all over the place.

At the current point in time, yes.

I hope to be able to make the entire of edit.c, plus good parts
of lex.c and syn.c and some parts of tree.c use 16-bit Unicode
internally.

On the other hand, we’d just need a different 8-to-16-bit
conversion function, and back, for EBCDIC. (The bad part
about *this* is that it messes with supporting various
EBCDIC variants. The good part is that we possibly could
use system library code for this, which we cannot on the
POSIX/ASCII world.)

>I'm very happy not to have to figure _everything_ out =)

Just ask ;)

I’ll do my part too and review parts of mksh for gems
like those, post-R51 though.

>practical concern... I mean, if you could support MacOS 9 with its
>CR-only line endings just by tweaking a few lines of code, surely that
>would be desirable, even if it makes an exception to the "consistent
>across all platforms" property? In reality, it would require a lot more
>invasive changes (not least due to the lack of POSIX), and I'd
>anticipate _that_ to be the main objection to integrating such support.

Absoluely no! A port of mksh to MacRelix is undergoing.
I just require it to use Unix line endings ;) (otherwise, we’d
have another fork on our hands, but lewellyn seems to agree)

MacRelix is something like Cygwin for System 7, AIUI.

>> You’ll be the maintainer of something we call mksh/zOS, or something
[…]
>Ooof... a long-term commitment like this is going to be hard for me. For
>my part, porting mksh is just one piece of a larger puzzle I'm building.
>I'd like to leave things in good shape here, and then move on to other
>areas of investigation.

OK. We’ll have to think about this. I’ll let this stand, for now.

>My hope is that EBCDIC support can be integrated by generalizing code
>that assumes ASCII, and adding a minimum of code that is specific to
>EBCDIC (which is why my patch all but apologizes for the tables needed
>for Ctrl-key mapping). The one change in my patch that best exemplifies
>the approach I have in mind is
>
>    -               if ((s[0] | 0x20) == 'x') {
>    +               if (s[0] == 'x' || s[0] == 'X') {

Right.

>That not only makes the code EBCDIC-compatible, it arguably makes it
>clearer. Not a separate section of code prone to bit rot, but simply a
>particular discipline applied to the common/platform-independent code.

Meh. I grew up with ASM (besides GW-BASIC). And some mksh targets
suffer from stupid compilers, like pcc; there, this does make a
difference. But, being honest with myself, other things would make
so much more a difference to completely hide this… thinking of an
optimised rotation implementation (we’re going to need this for
hash table lookups, which will only become more) for example… which
is not going to happen, as it’d involve writing asm code for every
target that can’t use the common C part, and I don’t want to go down
that direction for mksh.

So, I don’t quite agree with the reasoning, but I fully agree with
the direction this takes. (I had actually thought about this very
line, and wondered if I could make this into a macro… a bit more
obscure, but keeping the ASCII version more compact; I like small code.)

>> • or it could be a bunch of #ifdefs
>
>I would hope for the latter, keeping the set of #ifdefs as small and
>manageable as possible.

Hm.

Without going further into this, let me throw another idea into
the room:

a separate repository, with those things that would make the
ifdefs too much, in which I’d merge each release, possibly
even more often, from which EBCDIC releases are cut. Could
even be git, in case people like that more; the cvs to git
conversion appears to be pretty stable.

>Fair enough; I think that the changes needed to support EBCDIC would
>weigh in a lot lighter than those needed to support Windows natively.
>The same POSIX API is used, after all---the worst of it is the Ctrl-key
>mapping tables.

Yes and no. Michael managed to hide much in a library.

I think you can’t really compare them. The worst thing
on Win32 is fork emulation (though we got that done),
this is a completely different beast and has much more
impact on the user.

>Will mksh continue to support ISO 8859-1 (Latin-1) environments

mksh has never supported latin-1 (or any 8-bit codepage/SBCS,
or DBCS) environments, period.

mksh is always: ASCII, possibly UTF-8/CESU-8, but 8-bit transparent.

>As long as UTF-8 mode isn't outright required, and bit 7 is left
>alone, I don't see that EBCDIC systems have much to worry about.

I think I’ll eventually require 8<->16-bit conversion routines.
I currently supply them in expr.c (“UTF-8 support code: low-level
functions”) for the ASCII/UTF-8 case.

If all used codepages have a mapping for all possible octets,
and there are system functions we can use for this, we probably
should do so.

This is, however, strengthening my (tentative) resolution to
make this into a separate product. This removes certain promises
the shell offers to scripts that they can rely on, and a lot of
functionality.

>Well... you really think the changes are extensive enough to
>warrant a fork?

In some cases, I’d wish for it to be a fork, and not shipped
with the main code, for a one-liner. Opening files as O_TEXT
instead of O_BINARY on Win32, for example.

Deciding point here is the API exposed to the scripts (or the
interactive user), really.

I’m a bit unfair here, because lksh is included in the main
distribution, but is such a thing. Maybe, if the ifdefs don’t
get too many, we could ship it with the main tarball, but
require a specific Build.sh option to enable it. Like lksh.

The bikeshed question is, what to name it?
mksh/EBCDIC? mksh/zOS? mksh/OS390? Or what?
What should its KSH_VERSION look like⁴, and
do you want the mksh/lksh distinction⁵ too?

④ btw, is dot.mkshrc usable in your environment…
  once we get the bugs out, that is?

⑤ mostly, lksh uses POSIX arithmetic, whereas
  mksh use safe arithmetic (guaranteed 32-bit,
  with wraparound, and the signed arithmetics
  in shell are actually emulated using uint32_t
  in C code, plus it has guarantees for e.g.
  shift right, mostly like the 80386 works,
  and it can rotate left/right)

>I am quite agreeable to BSD-ish terms, and in any event I hereby
>explicitly agree to release all of my work relating to mksh under the
>same license terms as mksh itself.

Thanks. (I’d prefer to just “shut up and hack” myself, but…)

>> ① I have once, on an OSI mailing list, stated requirements for a
[…]

Well, just ignore this. As I said, it’ll probably not happen.
I’m mostly happy with the current text, and it’s a work of art
(note the justified paragraphs). With a few lines commenting
on intent (which I probably should write down, normatively),
it’ll work out.


>> Urgh. I’m rambling again. Sorry about that.
>
>Well, you've read this far, so at least you're game as well ;)

I a̲m̲ impressed if anyone else is following us so far ☻


Good night,
//mirabilos
-- 
(gnutls can also be used, but if you are compiling lynx for your own use,
there is no reason to consider using that package)
        -- Thomas E. Dickey on the Lynx mailing list, about OpenSSL

Re: [PATCH] IBM z/OS + EBCDIC support

Reply via email to