On Sat, 2017 Apr 22 23:26+0000, Thorsten Glaser wrote:
>
> >Oh, so you mean like if(c=='[') and such? That is certainly
> >reasonable. The program would be tied to the compile-time codepage no
> >worse than most other programs.
>
> Right. So either something like -DMKSH_EBCDIC_CP=1047 or limiting
> EBCDIC support to precisely one codepage.
I don't think the former sort of directive should be necessary. There is
enough auto-conversion magic going on that it should be possible to
piggyback on that... where it all "just works" when you compile the code.
> >(If you could do everything in terms of character literals, without
> >depending on constructs like if(c>='A'&&c<='Z'), your code would be
> >pretty much EBCDIC-proof.)
>
> Yesss… but…
>
> ① not all characters are in every codepage, and
True, but ASCII should be a given. (There are some older EBCDIC
codepages that lack certain common characters, I forget which ones, but
no one will want to use those anyway.)
> ② I need strictly monotonous ordering for all 256 possible octets
> for e.g. sorting strings in some cases and for [a-z] ranges
That sounds no worse than what is usually done for LC_COLLATE and
such...
> OK, I can live with that, so I just need to swap the conversion tables
> I got (which map 15 to NEL and 25 to LF).
Always thought it was funny that it's the weirdo mainframe platform
that has a proper "newline" character instead of pressing LF into
service as one ^_^
> > #pragma convert("ISO8859-1")
> […]
> >That may or may not be useful. Of course, the pragma would need to be
>
> Interesting, but I can’t think of where that would be useful at the
> moment. But good to know.
>
> Hmm. Can this be used to construct the table?
>
> Something like running this at configure time:
>
> main() {
> int i = 1;
>
> printf("#pragma convert(\"ISO8859-1\")\n");
> printf("static const unsigned char map[] = \"");
> while (i <= 255)
> printf("%c", i++);
> printf("\";\n");
> }
>
> And then feed its output into the compiling, and have
> some code generating the reverse map like:
>
> i = 0;
> while (i < 255)
> revmap[map[i]] = i + 1;
>
> But this reeks of fragility compared with supporting a known-good hand-
> edited set of codepages.
Probably easier just to use etoa(), or atoe()? I don't think explicit
hand-edited tables should be needed for EBCDIC, unless you're already
doing those for other encodings.
> (Not to say we can’t do this manually once in order to actually _get_
> those mappings.)
Certainly the above code would either need some tweaking, or the output
some massaging, so the odd characters (especially '"') don't throw off
the compiler.
> >Let me know if I can help any more!
>
> Okay, sure, thanks. I must admit I’m not actively working on this
> still but I’m considering making a separate branch on which we can try
> things until they work, then merge it back.
I'm happy to test iterations of this, as long as it doesn't need much
diagnosing...
> But first, the character class changes themselves. That turned out to
> be quite a bit more effort than I had estimated and will keep me busy
> for another longish hacking session. Ugh. Oh well. But on the plus
> side, this will make support much nicer as *all* constructs like “(c
> >= '0' && c <= '9')” will go away and even the OS/2 TEXTMODE line
> endings (where CR+LF is also supported) need less cpp hackery.
Sounds great! That'll certainly make EBCDIC easier to deal with.
I might suggest looking at Gnulib, specifically lib/c-ctype.h, for
inspiration. I helped them get their ctype implementation in order on
z/OS (and at one point we were even trying to deal with *signed* EBCDIC
chars, where 'A' has a negative value!), and it works solidly now.
They've got a good design for dealing with non-ASCII weirdness; they
were clearly thinking of that from the start.
Happy hacking,
--Daniel
--
Daniel Richard G. || [email protected]
My ASCII-art .sig got a bad case of Times New Roman.