Hello list, I'd like to submit some changes that add support for IBM z/OS mainframe systems (specifically, for mksh running in the OMVS Unix environment), including compatibility with EBCDIC.
The test suite tallies up as follows in an EBCDIC run: Total failed: 52 (4 ignored) (48 unexpected) Total passed: 446 This work is not complete, then, as there are many ASCII'isms not yet conditionalized in the code. Primarily, EBCDIC has the normal [0-9A-Za-z] characters beyond 0x80, so it is not possible to set the high bit for signalling purposes---which mksh seems to do a lot of. Addressing this will require greater familiarity with the program than I can muster for a few days' work. The following files are attached: 1. Patch against current CVS 2. Output of running Build.sh, gzip'ed 3. Output of running test.sh, gzip'ed Below is a walk-through of the changes in the patch. I'll elide most things that are self-explanatory or repeated: +++ Build.sh * Added clauses for TARGET_OS == "OS/390" * '\012\015' != '\n\r' on this platform, so use the latter * When compiling with -g, xlc produces a .dbg file alongside each object file, so clean those up * NSIG is, amazingly, not #defined on this platform. Sure would be nice if the fancy logic that calculates NSIG could conditionally #define it, rather than a TARGET_OS conditional... :-) * Check whether "nroff -c" is supported---the system I'm using has GNU nroff 1.17, which doesn't have -c * On this platform, xlc -qflag=... takes only one suboption, not two * Some special flags are needed for xlc on z/OS that are not needed on AIX, like to make missing #include files an error instead of a warning (!). Conversely, most of those AIX xlc flags are not recognized * Added a note that EBCDIC has \047 as the escape character rather than \033 +++ check.pl * I was getting a parse error with an expected-exit value of "e != 0", and adding \d to this regex fixed things... this wasn't breaking for other folks? +++ check.t * The "cd-pe" test fails on this system (perhaps it should be disabled?) and the directories were not getting cleaned up properly +++ sh.h (out of patch order, as other files use these changes) * Wall-o'-text about EBCDIC for the newbies * Added logic to detect and enable MKSH_EBCDIC automatically. Here I rely on detecting the IBM compiler and __CHARSET_LIB; an alternate (if more verbose) approach is the way C_CTYPE_ASCII is defined in https://github.com/c9/node-gnu-tools/blob/master/grep-src/lib/c-ctype.h * If compiling in ASCII mode, #define _ENHANCED_ASCII_EXT so that as many C/system calls are switched to ASCII as possible (this is something I was experimenting with, but it's not how most people would be building/using mksh on this system) * Define symbols for some common character/string escape literals so we can swap them out easily * Because EBCDIC characters like 'A' will have a negative value if signed chars are being used, #define the ORD() macro so we can always get an integer value in [0, 255] * Added EBCDIC-compatible versions of the ksh_isXXXXX() and ksh_toXXXXX() macros * Define EBCDIC-compatible versions of the CTRL(), UNCTRL() and ISCTRL() macros. Because the C0 control characters are mapped directly from ASCII printable characters, I figured a full-nine-yards translation table was needed for CTRL() and UNCTRL(), and so those are implemented as functions for speed. Because a function cannot be used in a switch statement nor (for xlc) array initialization, I added a new CCTRL() macro (the extra "C" is for "constant") that contains an expression that can be evaluated at compile time. To keep this macro at a reasonable size, however, it only accepts one "slice" of ASCII characters---CCTRL('A') should be used instead of CCTRL('a'), and so on. I implemented ISCTRL() as a macro, and this definition is valid, but there are many more control characters than there are ASCII C0 codes. See the commented-out definition of ebcdic_isctrl() in misc.c for an alternate implementation * The ebcdic_*() functions should probably be renamed to better fit mksh conventions +++ edit.c (back to patch order) * I don't understand exactly what is_mfs() is used for, but I'm pretty sure we can't do the & 0x80 with EBCDIC (note that e.g. 'A' == 0xC1) * Use CCTRL() instead of CTRL() in this array initialization. I did not adjust the indentation of the adjacent lines to keep the patch size down, but that will be desirable * Don't know much about XFUNC_VALUE(), but that & 0x7F looks un-kosher for EBCDIC * Don't use lowercase letters in CCTRL() +++ main.c * Need to initialize the EBCDIC escape translation tables at startup +++ misc.c * IBM z/OS already has a function named cclass(): $ grep cclass /usr/include/*.h /usr/include/collate.h: int cclass(char *, collel_t **); The signature clash was causing the build to break * Don't assume the A-Z alphabet is contiguous * Relocated the comment about escape portability to sh.h * Control-key mapping escape table implementation, including the bit of Perl I used to generate the actual mapping code. The control code for '?' is handled as a special case (but could be incorporated into the Perl if desired) * Note the commented-out ebcdic_isctrl() function. This may or may not be preferable to the EBCDIC ISCTRL() macro currently in sh.h. Be aware that this function will return true for a lot fewer inputs than the macro +++ var.c * Check for upper/lowercase 'X' without resorting to ASCII trickery * Use the ORD() macro so that these subtractions don't inadvertently become additions I will be happy to provide further testing and answer any questions as needed. --Daniel P.S.: Please Cc: me in any replies, as I am not subscribed to this list. -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Index: Build.sh =================================================================== RCS file: /cvs/src/bin/mksh/Build.sh,v retrieving revision 1.674 diff -u -r1.674 Build.sh --- Build.sh 19 Apr 2015 18:50:59 -0000 1.674 +++ Build.sh 24 Apr 2015 02:12:22 -0000 @@ -419,7 +419,11 @@ na=0 fi hf=$1; shift - hv=`echo "$hf" | tr -d '\012\015' | tr -c $alll$allu$alln $alls` + case "$TARGET_OS" in + OS/390) lfcr='\n\r' ;; # EBCDIC goofiness + *) lfcr='\012\015' ;; + esac + hv=`echo "$hf" | tr -d "$lfcr" | tr -c $alll$allu$alln $alls` echo "/* NeXTstep bug workaround */" >x for i do @@ -577,7 +581,7 @@ echo "$me: Error: ./$tfn is a directory!" >&2 exit 1 fi -rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.ll *.o *.gen \ +rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.dbg *.ll *.o *.gen \ Rebuild.sh signames.inc test.sh x vv.out SRCS="lalloc.c eval.c exec.c expr.c funcs.c histrap.c jobs.c" @@ -829,6 +833,12 @@ OpenBSD) : ${HAVE_SETLOCALE_CTYPE=0} ;; +OS/390) + SIZE=: # not available + add_cppflags -DNSIG=32 + add_cppflags -D_ALL_SOURCE + oswarn="; EBCDIC support is incomplete" + ;; OSF1) HAVE_SIG_T=0 # incompatible add_cppflags -D_OSF_SOURCE @@ -929,6 +939,7 @@ : ${AWK=awk} ${CC=cc} ${NROFF=nroff} ${SIZE=size} test 0 = $r && echo | $NROFF -v 2>&1 | grep GNU >/dev/null 2>&1 && \ + echo | $NROFF -c >/dev/null 2>&1 && \ NROFF="$NROFF -c" # this aids me in tracing FTBFSen without access to the buildd @@ -1327,8 +1338,16 @@ DOWARN=-Wc,-we ;; xlc) - save_NOWARN=-qflag=i:e - DOWARN=-qflag=i:i + case "$TARGET_OS" in + OS/390) + save_NOWARN=-qflag=e + DOWARN=-qflag=i + ;; + *) + save_NOWARN=-qflag=i:e + DOWARN=-qflag=i:i + ;; + esac ;; *) test x"$save_NOWARN" = x"" && save_NOWARN=-Wno-error @@ -1493,10 +1512,25 @@ ac_flags 1 extansi -Xa ;; xlc) - ac_flags 1 rodata "-qro -qroconst -qroptr" - ac_flags 1 rtcheck -qcheck=all - #ac_flags 1 rtchkc -qextchk # reported broken - ac_flags 1 wformat "-qformat=all -qformat=nozln" + case "$TARGET_OS" in + OS/390) + # On IBM z/OS, the following are warnings by default + # CCN3296: #include file <foo.h> not found. + # CCN3944: Attribute "__foo__" is not supported and is ignored. + # CCN3963: The attribute "foo" is not a valid variable + # attribute and is ignored. + ac_flags 1 halton "-qhaltonmsg=CCN3296 -qhaltonmsg=CCN3944 -qhaltonmsg=CCN3963" + # CCN3290: Unknown macro name FOO on #undef directive. + # CCN4108: The use of keyword '__attribute__' is non-portable. + ac_flags 1 supprss "-qsuppress=CCN3290 -qsuppress=CCN4108" + ;; + *) + ac_flags 1 rodata "-qro -qroconst -qroptr" + ac_flags 1 rtcheck -qcheck=all + #ac_flags 1 rtchkc -qextchk # reported broken + ac_flags 1 wformat "-qformat=all -qformat=nozln" + ;; + esac #ac_flags 1 wp64 -qwarn64 # too verbose for now ;; esac @@ -2628,8 +2662,8 @@ MKSH_ASSUME_UTF8 (0=disabled, 1=enabled; default: unset) MKSH_BINSHPOSIX if */sh or */-sh, enable set -o posix MKSH_BINSHREDUCED if */sh or */-sh, enable set -o sh -MKSH_CLRTOEOL_STRING "\033[K" -MKSH_CLS_STRING "\033[;H\033[J" +MKSH_CLRTOEOL_STRING "\033[K" (replace \033 with \047 on EBCDIC) +MKSH_CLS_STRING "\033[;H\033[J" (likewise) MKSH_CONSERVATIVE_FDS fd 0-9 for scripts, shell only up to 31 MKSH_DEFAULT_EXECSHELL "/bin/sh" (do not change) MKSH_DEFAULT_PROFILEDIR "/etc" (do not change) Index: check.pl =================================================================== RCS file: /cvs/src/bin/mksh/check.pl,v retrieving revision 1.38 diff -u -r1.38 check.pl --- check.pl 8 Mar 2015 22:54:55 -0000 1.38 +++ check.pl 24 Apr 2015 02:12:22 -0000 @@ -1165,7 +1165,7 @@ print STDERR "$prog:$test{':long-name'}: expected-exit value $val not in 0..255\n"; return undef; } - } elsif ($val !~ /^([\s<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) { + } elsif ($val !~ /^([\s\d<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) { print STDERR "$prog:$test{':long-name'}: bad expected-exit expression: $val\n"; return undef; } Index: check.t =================================================================== RCS file: /cvs/src/bin/mksh/check.t,v retrieving revision 1.690 diff -u -r1.690 check.t --- check.t 19 Apr 2015 19:18:27 -0000 1.690 +++ check.t 24 Apr 2015 02:12:24 -0000 @@ -1216,7 +1216,7 @@ cd -P$1 subdir echo 2=$?,${PWD#$bwd/} cd $bwd - chmod 755 renamed + chmod 755 noread renamed 2>/dev/null rm -rf noread link renamed stdin: export TSHELL="$__progname" Index: edit.c =================================================================== RCS file: /cvs/src/bin/mksh/edit.c,v retrieving revision 1.284 diff -u -r1.284 edit.c --- edit.c 11 Apr 2015 22:10:12 -0000 1.284 +++ edit.c 24 Apr 2015 02:12:25 -0000 @@ -37,10 +37,10 @@ * which do a full power cycle then... */ #ifndef MKSH_CLS_STRING -#define MKSH_CLS_STRING "\033[;H\033[J" +#define MKSH_CLS_STRING MKSH_ESCAPE_STRING "[;H" MKSH_ESCAPE_STRING "[J" #endif #ifndef MKSH_CLRTOEOL_STRING -#define MKSH_CLRTOEOL_STRING "\033[K" +#define MKSH_CLRTOEOL_STRING MKSH_ESCAPE_STRING "[K" #endif /* tty driver characters we are interested in */ @@ -885,7 +885,11 @@ /* Separator for completion */ #define is_cfs(c) ((c) == ' ' || (c) == '\t' || (c) == '"' || (c) == '\'') /* Separator for motion */ -#define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80))) +#ifdef MKSH_EBCDIC +# define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$')) +#else +# define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80))) +#endif #define X_NTABS 3 /* normal, meta1, meta2 */ #define X_TABSZ 256 /* size of keydef tables etc */ @@ -1010,56 +1014,56 @@ }; static struct x_defbindings const x_defbindings[] = { - { XFUNC_del_back, 0, CTRL('?') }, - { XFUNC_del_bword, 1, CTRL('?') }, - { XFUNC_eot_del, 0, CTRL('D') }, - { XFUNC_del_back, 0, CTRL('H') }, - { XFUNC_del_bword, 1, CTRL('H') }, + { XFUNC_del_back, 0, CCTRL('?') }, + { XFUNC_del_bword, 1, CCTRL('?') }, + { XFUNC_eot_del, 0, CCTRL('D') }, + { XFUNC_del_back, 0, CCTRL('H') }, + { XFUNC_del_bword, 1, CCTRL('H') }, { XFUNC_del_bword, 1, 'h' }, { XFUNC_mv_bword, 1, 'b' }, { XFUNC_mv_fword, 1, 'f' }, { XFUNC_del_fword, 1, 'd' }, - { XFUNC_mv_back, 0, CTRL('B') }, - { XFUNC_mv_forw, 0, CTRL('F') }, - { XFUNC_search_char_forw, 0, CTRL(']') }, - { XFUNC_search_char_back, 1, CTRL(']') }, - { XFUNC_newline, 0, CTRL('M') }, - { XFUNC_newline, 0, CTRL('J') }, - { XFUNC_end_of_text, 0, CTRL('_') }, - { XFUNC_abort, 0, CTRL('G') }, - { XFUNC_prev_com, 0, CTRL('P') }, - { XFUNC_next_com, 0, CTRL('N') }, - { XFUNC_nl_next_com, 0, CTRL('O') }, - { XFUNC_search_hist, 0, CTRL('R') }, + { XFUNC_mv_back, 0, CCTRL('B') }, + { XFUNC_mv_forw, 0, CCTRL('F') }, + { XFUNC_search_char_forw, 0, CCTRL(']') }, + { XFUNC_search_char_back, 1, CCTRL(']') }, + { XFUNC_newline, 0, CCTRL('M') }, + { XFUNC_newline, 0, CCTRL('J') }, + { XFUNC_end_of_text, 0, CCTRL('_') }, + { XFUNC_abort, 0, CCTRL('G') }, + { XFUNC_prev_com, 0, CCTRL('P') }, + { XFUNC_next_com, 0, CCTRL('N') }, + { XFUNC_nl_next_com, 0, CCTRL('O') }, + { XFUNC_search_hist, 0, CCTRL('R') }, { XFUNC_beg_hist, 1, '<' }, { XFUNC_end_hist, 1, '>' }, { XFUNC_goto_hist, 1, 'g' }, - { XFUNC_mv_end, 0, CTRL('E') }, - { XFUNC_mv_begin, 0, CTRL('A') }, - { XFUNC_draw_line, 0, CTRL('L') }, - { XFUNC_cls, 1, CTRL('L') }, - { XFUNC_meta1, 0, CTRL('[') }, - { XFUNC_meta2, 0, CTRL('X') }, - { XFUNC_kill, 0, CTRL('K') }, - { XFUNC_yank, 0, CTRL('Y') }, + { XFUNC_mv_end, 0, CCTRL('E') }, + { XFUNC_mv_begin, 0, CCTRL('A') }, + { XFUNC_draw_line, 0, CCTRL('L') }, + { XFUNC_cls, 1, CCTRL('L') }, + { XFUNC_meta1, 0, CCTRL('[') }, + { XFUNC_meta2, 0, CCTRL('X') }, + { XFUNC_kill, 0, CCTRL('K') }, + { XFUNC_yank, 0, CCTRL('Y') }, { XFUNC_meta_yank, 1, 'y' }, - { XFUNC_literal, 0, CTRL('^') }, + { XFUNC_literal, 0, CCTRL('^') }, { XFUNC_comment, 1, '#' }, - { XFUNC_transpose, 0, CTRL('T') }, - { XFUNC_complete, 1, CTRL('[') }, - { XFUNC_comp_list, 0, CTRL('I') }, + { XFUNC_transpose, 0, CCTRL('T') }, + { XFUNC_complete, 1, CCTRL('[') }, + { XFUNC_comp_list, 0, CCTRL('I') }, { XFUNC_comp_list, 1, '=' }, { XFUNC_enumerate, 1, '?' }, { XFUNC_expand, 1, '*' }, - { XFUNC_comp_file, 1, CTRL('X') }, - { XFUNC_comp_comm, 2, CTRL('[') }, + { XFUNC_comp_file, 1, CCTRL('X') }, + { XFUNC_comp_comm, 2, CCTRL('[') }, { XFUNC_list_comm, 2, '?' }, - { XFUNC_list_file, 2, CTRL('Y') }, + { XFUNC_list_file, 2, CCTRL('Y') }, { XFUNC_set_mark, 1, ' ' }, - { XFUNC_kill_region, 0, CTRL('W') }, - { XFUNC_xchg_point_mark, 2, CTRL('X') }, - { XFUNC_literal, 0, CTRL('V') }, - { XFUNC_version, 1, CTRL('V') }, + { XFUNC_kill_region, 0, CCTRL('W') }, + { XFUNC_xchg_point_mark, 2, CCTRL('X') }, + { XFUNC_literal, 0, CCTRL('V') }, + { XFUNC_version, 1, CCTRL('V') }, { XFUNC_prev_histword, 1, '.' }, { XFUNC_prev_histword, 1, '_' }, { XFUNC_set_arg, 1, '0' }, @@ -1126,7 +1130,7 @@ } } -#ifdef MKSH_SMALL +#if defined(MKSH_SMALL) || defined(MKSH_EBCDIC) #define XFUNC_VALUE(f) (f) #else #define XFUNC_VALUE(f) (f & 0x7F) @@ -2885,7 +2889,7 @@ } else x_putc(c); switch (c) { - case 7: + case MKSH_BELL_CHAR: break; case '\r': case '\n': @@ -2921,7 +2925,7 @@ x_putc(c); } switch (c) { - case 7: + case MKSH_BELL_CHAR: break; case '\r': case '\n': @@ -3976,7 +3980,7 @@ case '\n': return (1); - case CTRL('['): + case CCTRL('['): expanded = NONE; if (first_insert) { first_insert = false; @@ -3994,19 +3998,19 @@ return (redo_insert(lastac - 1)); /* { Begin nonstandard vi commands */ - case CTRL('x'): + case CCTRL('X'): expand_word(0); break; - case CTRL('f'): + case CCTRL('F'): complete_word(0, 0); break; - case CTRL('e'): + case CCTRL('E'): print_expansions(es, 0); break; - case CTRL('i'): + case CCTRL('I'): if (Flag(FVITABCOMPLETE)) { complete_word(0, 0); break; @@ -4061,8 +4065,8 @@ } switch (*cmd) { - case CTRL('l'): - case CTRL('r'): + case CCTRL('L'): + case CCTRL('R'): redraw_line(true); break; @@ -4249,7 +4253,7 @@ case 'j': case '+': - case CTRL('n'): + case CCTRL('N'): if (grabhist(modified, hnum + argcnt) < 0) return (-1); else { @@ -4260,7 +4264,7 @@ case 'k': case '-': - case CTRL('p'): + case CCTRL('P'): if (grabhist(modified, hnum - argcnt) < 0) return (-1); else { @@ -4492,26 +4496,26 @@ /* AT&T ksh */ case '=': /* Nonstandard vi/ksh */ - case CTRL('e'): + case CCTRL('E'): print_expansions(es, 1); break; /* Nonstandard vi/ksh */ - case CTRL('i'): + case CCTRL('I'): if (!Flag(FVITABCOMPLETE)) return (-1); complete_word(1, argcnt); break; /* some annoying AT&T kshs */ - case CTRL('['): + case CCTRL('['): if (!Flag(FVIESCCOMPLETE)) return (-1); /* AT&T ksh */ case '\\': /* Nonstandard vi/ksh */ - case CTRL('f'): + case CCTRL('F'): complete_word(1, argcnt); break; @@ -4519,7 +4523,7 @@ /* AT&T ksh */ case '*': /* Nonstandard vi/ksh */ - case CTRL('x'): + case CCTRL('X'): expand_word(1); break; @@ -4598,7 +4602,7 @@ break; case 'h': - case CTRL('h'): + case CCTRL('H'): if (!sub && es->cursor == 0) return (-1); ncursor = es->cursor - argcnt; Index: main.c =================================================================== RCS file: /cvs/src/bin/mksh/main.c,v retrieving revision 1.292 diff -u -r1.292 main.c --- main.c 19 Apr 2015 18:51:01 -0000 1.292 +++ main.c 24 Apr 2015 02:12:25 -0000 @@ -282,6 +282,10 @@ initctypes(); +#ifdef MKSH_EBCDIC + initebcdic(); +#endif + inittraps(); coproc_init(); Index: misc.c =================================================================== RCS file: /cvs/src/bin/mksh/misc.c,v retrieving revision 1.226 diff -u -r1.226 misc.c --- misc.c 20 Mar 2015 21:47:04 -0000 1.226 +++ misc.c 24 Apr 2015 02:12:25 -0000 @@ -52,7 +52,7 @@ const unsigned char *, bool) MKSH_A_PURE; static int do_gmatch(const unsigned char *, const unsigned char *, const unsigned char *, const unsigned char *) MKSH_A_PURE; -static const unsigned char *cclass(const unsigned char *, unsigned char) +static const unsigned char *mksh_cclass(const unsigned char *, unsigned char) MKSH_A_PURE; #ifdef KSH_CHVT_CODE static void chvt(const Getopt *); @@ -93,12 +93,8 @@ void initctypes(void) { - int c; - - for (c = 'a'; c <= 'z'; c++) - chtypes[c] |= C_ALPHA; - for (c = 'A'; c <= 'Z'; c++) - chtypes[c] |= C_ALPHA; + setctypes("abcdefghijklmnopqrstuvwxyz", C_ALPHA); + setctypes("ABCDEFGHIJKLMNOPQRSTUVWXYZ", C_ALPHA); chtypes['_'] |= C_ALPHA; setctypes("0123456789", C_DIGIT); /* \0 added automatically */ @@ -776,7 +772,7 @@ } switch (*p++) { case '[': - if (sc == 0 || (p = cclass(p, sc)) == NULL) + if (sc == 0 || (p = mksh_cclass(p, sc)) == NULL) return (0); break; @@ -889,7 +885,7 @@ } static const unsigned char * -cclass(const unsigned char *p, unsigned char sub) +mksh_cclass(const unsigned char *p, unsigned char sub) { unsigned char c, d; bool notp, found = false; @@ -1159,7 +1155,7 @@ ++p; switch (c) { /* see unbksl() in this file for comments */ - case 7: + case MKSH_BELL_CHAR: c = 'a'; if (0) /* FALLTHROUGH */ @@ -1183,11 +1179,11 @@ c = 't'; if (0) /* FALLTHROUGH */ - case 11: + case MKSH_VTAB_CHAR: c = 'v'; if (0) /* FALLTHROUGH */ - case '\033': + case MKSH_ESCAPE_CHAR: /* take E not e because \e is \ in *roff */ c = 'E'; /* FALLTHROUGH */ @@ -1197,7 +1193,11 @@ if (0) /* FALLTHROUGH */ default: +#ifdef MKSH_EBCDIC + if (c < 64 || c == 0xFF) { +#else if (c < 32 || c > 0x7E) { +#endif /* FALLTHROUGH */ case '\'': shf_fprintf(shf, "\\%03o", c); @@ -2136,13 +2136,7 @@ fc = (*fg)(); switch (fc) { case 'a': - /* - * according to the comments in pdksh, \007 seems - * to be more portable than \a (due to HP-UX cc, - * Ultrix cc, old pcc, etc.) so we avoid the escape - * sequence altogether in mksh and assume ASCII - */ - wc = 7; + wc = MKSH_BELL_CHAR; break; case 'b': wc = '\b'; @@ -2155,7 +2149,7 @@ break; case 'E': case 'e': - wc = 033; + wc = MKSH_ESCAPE_CHAR; break; case 'f': wc = '\f'; @@ -2170,8 +2164,7 @@ wc = '\t'; break; case 'v': - /* assume ASCII here as well */ - wc = 11; + wc = MKSH_VTAB_CHAR; break; case '1': case '2': @@ -2253,3 +2246,144 @@ return (wc); } + +#ifdef MKSH_EBCDIC + +/* + * The mapping of control keys to C0 control codes (e.g. Ctrl-D => EOT) + * is directly tied to ASCII, so for EBCDIC we have no option but to use + * conversion tables if we want to keep the same mapping. + * + * The mappings are generated with the following Perl code: + * +-------------------------------- +#!/usr/bin/env perl + +use Text::Iconv; +use strict; + +my $cvt = Text::Iconv->new("iso8859-1", "ibm-1047"); + +sub fmtchr($) +{ + my $n = $_[0]; + my $c = chr($n); + $c = "\\$c" if $c =~ /['\\]/; + $c = "\\177" if $n eq 0x7F; + return $c; +} + +# generate ebcdic_ctrl_map[][] array +for (my $i = 0x40; $i <= 0x5F; $i++) +{ + my $a = chr($i & 0x1f); + my $e = $cvt->convert($a); + printf "\t{ '%s', 0x%02X }, { '%s', 0x%02X }, { '%s', 0x%02X },\n", + fmtchr($i), ord($e), + fmtchr($i - 0x20), ord($e), + fmtchr($i + 0x20), ord($e); +} + +# generate CCTRL() macro +for (my $i = 0x41; $i <= 0x5F; $i++) +{ + my $a = chr($i & 0x1f); + my $e = $cvt->convert($a); + printf "\t\t\t (x) == '%s' ? 0x%02X : \\\n", fmtchr($i), ord($e); +} +-------------------------------- + * + */ + +static const int ebcdic_ctrl_map[][2] = { + /* + * ebcdic_ctrl_map[x][0] = ASCII control key + * ebcdic_ctrl_map[x][1] = EBCDIC control code + * + * Example: Ctrl-D -> 'D' -> ASCII 0x04 -> EOT -> EBCDIC 0x37 + * + * Column 1 represents the standard ASCII control keys + * Column 2 is the col. 1 set shifted minus 0x20 + * Column 3 is the col. 1 set shifted plus 0x20 + * + * Note: '?' is special-cased below + */ + { '@', 0x00 }, { ' ', 0x00 }, { '`', 0x00 }, + { 'A', 0x01 }, { '!', 0x01 }, { 'a', 0x01 }, + { 'B', 0x02 }, { '"', 0x02 }, { 'b', 0x02 }, + { 'C', 0x03 }, { '#', 0x03 }, { 'c', 0x03 }, + { 'D', 0x37 }, { '$', 0x37 }, { 'd', 0x37 }, + { 'E', 0x2D }, { '%', 0x2D }, { 'e', 0x2D }, + { 'F', 0x2E }, { '&', 0x2E }, { 'f', 0x2E }, + { 'G', 0x2F }, { '\'',0x2F }, { 'g', 0x2F }, + { 'H', 0x16 }, { '(', 0x16 }, { 'h', 0x16 }, + { 'I', 0x05 }, { ')', 0x05 }, { 'i', 0x05 }, + { 'J', 0x25 }, { '*', 0x25 }, { 'j', 0x25 }, + { 'K', 0x0B }, { '+', 0x0B }, { 'k', 0x0B }, + { 'L', 0x0C }, { ',', 0x0C }, { 'l', 0x0C }, + { 'M', 0x0D }, { '-', 0x0D }, { 'm', 0x0D }, + { 'N', 0x0E }, { '.', 0x0E }, { 'n', 0x0E }, + { 'O', 0x0F }, { '/', 0x0F }, { 'o', 0x0F }, + { 'P', 0x10 }, { '0', 0x10 }, { 'p', 0x10 }, + { 'Q', 0x11 }, { '1', 0x11 }, { 'q', 0x11 }, + { 'R', 0x12 }, { '2', 0x12 }, { 'r', 0x12 }, + { 'S', 0x13 }, { '3', 0x13 }, { 's', 0x13 }, + { 'T', 0x3C }, { '4', 0x3C }, { 't', 0x3C }, + { 'U', 0x3D }, { '5', 0x3D }, { 'u', 0x3D }, + { 'V', 0x32 }, { '6', 0x32 }, { 'v', 0x32 }, + { 'W', 0x26 }, { '7', 0x26 }, { 'w', 0x26 }, + { 'X', 0x18 }, { '8', 0x18 }, { 'x', 0x18 }, + { 'Y', 0x19 }, { '9', 0x19 }, { 'y', 0x19 }, + { 'Z', 0x3F }, { ':', 0x3F }, { 'z', 0x3F }, + { '[', 0x27 }, { ';', 0x27 }, { '{', 0x27 }, + { '\\',0x1C }, { '<', 0x1C }, { '|', 0x1C }, + { ']', 0x1D }, { '=', 0x1D }, { '}', 0x1D }, + { '^', 0x1E }, { '>', 0x1E }, { '~', 0x1E }, + { '_', 0x1F }, { '?', 0x1F }, { '\177', 0x1F }, + { '\0', 0 } +}; + +static int ebcdic_ctrl_table[UCHAR_MAX] = { 0 }; +static int ebcdic_unctrl_table[UCHAR_MAX] = { 0 }; + +void +initebcdic(void) +{ + int i; + + for (i = 0; i < UCHAR_MAX; i++) { + ebcdic_ctrl_table[i] = 0; + ebcdic_unctrl_table[i] = 0; + } + + for (i = 0; ebcdic_ctrl_map[i][0]; i++) { + int ch = ORD(ebcdic_ctrl_map[i][0]); + int cc = ebcdic_ctrl_map[i][1]; + ebcdic_ctrl_table[ch] = cc; + if (ch % 3 == 0) + ebcdic_unctrl_table[cc] = ch; + } + + /* special case */ + ebcdic_ctrl_table[ORD('?')] = 0x07; + ebcdic_unctrl_table[0x07] = '?'; +} + +int ebcdic_ctrl(int c) +{ + return ebcdic_ctrl_table[ORD(c)]; +} + +int ebcdic_unctrl(int c) +{ + return ebcdic_unctrl_table[ORD(c)]; +} + +/* +int ebcdic_isctrl(int c) +{ + return c == 0 || ebcdic_unctrl_table[ORD(c)] != 0; +} +*/ + +#endif /* MKSH_EBCDIC */ Index: sh.h =================================================================== RCS file: /cvs/src/bin/mksh/sh.h,v retrieving revision 1.725 diff -u -r1.725 sh.h --- sh.h 19 Apr 2015 19:18:31 -0000 1.725 +++ sh.h 24 Apr 2015 02:12:26 -0000 @@ -251,6 +251,64 @@ #ifndef MKSH_INCLUDES_ONLY +/* + * Many headaches with EBCDIC: + * 1. There are numerous EBCDIC variants, and it is not feasible for us + * to support them all. But we can support the EBCDIC code pages that + * contain all (most?) of the characters in ASCII, and these + * thankfully tend to agree on the code points assigned to the ASCII + * subset. If you need a representative example, look at EBCDIC 1047, + * which is first among equals in the IBM MVS development + * environment: http://en.wikipedia.org/wiki/EBCDIC_1047 + * 2. Character ranges that are contiguous in ASCII, like the letters + * in [A-Z], are broken up into segments (i.e. [A-IJ-RS-Z]), so we + * can't implement e.g. islower() as { return c >= 'a' && c <= 'z'; } + * because it will also return true for a handful of extraneous + * characters (like the plus-minus sign at 0x8F in EBCDIC 1047, a + * little after 'i'). But at least '_' is not one of these. + * 3. The normal [0-9A-Za-z] characters are at codepoints beyond 0x80. + * Not only do they require all 8 bits instead of 7, if chars are + * signed, they will have negative integer values! Something like + * (c - 'A') could actually become (c + 63)! Use the ORD() macro to + * ensure you're getting a value in [0, 255]. + * 4. '\n' is actually NL (0x15, U+0085) instead of LF (0x25, U+000A). + * EBCDIC has a proper newline character instead of "emulating" one + * with line feeds. + * 5. Note that it is possible to compile programs in ASCII mode on IBM + * mainframe systems, using the -qascii option to the XL C compiler. + * We can determine the build mode by looking at __CHARSET_LIB: + * 0 == EBCDIC, 1 == ASCII + */ +#if defined(__MVS__) && defined(__IBMC__) && !defined(MKSH_EBCDIC) +# if defined(__CHARSET_LIB) && __CHARSET_LIB +# ifndef _ENHANCED_ASCII_EXT +# define _ENHANCED_ASCII_EXT 0xFFFFFFFF /* go all-out on ASCII */ +# endif +# else +# define MKSH_EBCDIC +# endif +#endif + +#ifdef MKSH_EBCDIC +/* + * use symbolic escapes when possible, let the compiler sort it out + */ +# define MKSH_BELL_CHAR '\a' +# define MKSH_ESCAPE_CHAR '\047' +# define MKSH_ESCAPE_STRING "\047" +# define MKSH_VTAB_CHAR '\v' +#else +/* + * according to the comments in pdksh, \007 seems to be more portable + * than \a (due to HP-UX cc, Ultrix cc, old pcc, etc.) so we avoid the + * latter altogether when we're using ASCII + */ +# define MKSH_BELL_CHAR '\007' +# define MKSH_ESCAPE_CHAR '\033' +# define MKSH_ESCAPE_STRING "\033" +# define MKSH_VTAB_CHAR '\013' +#endif + /* extra types */ #if !HAVE_GETRUSAGE @@ -298,13 +356,24 @@ } while (/* CONSTCOND */ 0) #endif -#define ksh_isdigit(c) (((c) >= '0') && ((c) <= '9')) -#define ksh_islower(c) (((c) >= 'a') && ((c) <= 'z')) -#define ksh_isupper(c) (((c) >= 'A') && ((c) <= 'Z')) -#define ksh_tolower(c) (((c) >= 'A') && ((c) <= 'Z') ? (c) - 'A' + 'a' : (c)) -#define ksh_toupper(c) (((c) >= 'a') && ((c) <= 'z') ? (c) - 'a' + 'A' : (c)) +#define ORD(c) ((int)(unsigned char)(c)) +#ifdef MKSH_EBCDIC +# define ksh_isdigit(c) (ORD(c) >= ORD('0') && ORD(c) <= ORD('9')) +# define ksh_islower(c) (ORD(c) >= ORD('a') && ORD(c) <= ORD('z') && \ + ksh_isalphx((c))) +# define ksh_isupper(c) (ORD(c) >= ORD('A') && ORD(c) <= ORD('Z') && \ + ksh_isalphx((c))) +# define ksh_isspace(c) ((c) == '\t' || (c) == '\n' || (c) == '\v' || \ + (c) == '\f' || (c) == '\r' || (c) == ' ') +#else +# define ksh_isdigit(c) (((c) >= '0') && ((c) <= '9')) +# define ksh_islower(c) (((c) >= 'a') && ((c) <= 'z')) +# define ksh_isupper(c) (((c) >= 'A') && ((c) <= 'Z')) +# define ksh_isspace(c) ((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20)) +#endif +#define ksh_tolower(c) (ksh_isupper((c)) ? (c) - 'A' + 'a' : (c)) +#define ksh_toupper(c) (ksh_islower((c)) ? (c) - 'a' + 'A' : (c)) #define ksh_isdash(s) (((s)[0] == '-') && ((s)[1] == '\0')) -#define ksh_isspace(c) ((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20)) #define ksh_min(x,y) ((x) < (y) ? (x) : (y)) #define ksh_max(x,y) ((x) > (y) ? (x) : (y)) @@ -460,7 +529,7 @@ * not a char that is used often. Also, can't use the high bit as it causes * portability problems (calling strchr(x, 0x80 | 'x') is error prone). */ -#define MAGIC (7) /* prefix for *?[!{,} during expand */ +#define MAGIC (MKSH_BELL_CHAR) /* prefix for *?[!{,} during expand */ #define ISMAGIC(c) ((unsigned char)(c) == MAGIC) EXTERN const char *safe_prompt; /* safe prompt if PS1 substitution fails */ @@ -1583,9 +1652,49 @@ #define HERES 10 /* max number of << in line */ #undef CTRL -#define CTRL(x) ((x) == '?' ? 0x7F : (x) & 0x1F) /* ASCII */ -#define UNCTRL(x) ((x) ^ 0x40) /* ASCII */ -#define ISCTRL(x) (((signed char)((uint8_t)(x) + 1)) < 33) +#ifdef MKSH_EBCDIC +# define CTRL(x) (ebcdic_ctrl(x)) +# define CCTRL(x) ((x) == '?' ? 0x07 : /* special case */ \ + (x) == 'A' ? 0x01 : \ + (x) == 'B' ? 0x02 : \ + (x) == 'C' ? 0x03 : \ + (x) == 'D' ? 0x37 : \ + (x) == 'E' ? 0x2D : \ + (x) == 'F' ? 0x2E : \ + (x) == 'G' ? 0x2F : \ + (x) == 'H' ? 0x16 : \ + (x) == 'I' ? 0x05 : \ + (x) == 'J' ? 0x25 : \ + (x) == 'K' ? 0x0B : \ + (x) == 'L' ? 0x0C : \ + (x) == 'M' ? 0x0D : \ + (x) == 'N' ? 0x0E : \ + (x) == 'O' ? 0x0F : \ + (x) == 'P' ? 0x10 : \ + (x) == 'Q' ? 0x11 : \ + (x) == 'R' ? 0x12 : \ + (x) == 'S' ? 0x13 : \ + (x) == 'T' ? 0x3C : \ + (x) == 'U' ? 0x3D : \ + (x) == 'V' ? 0x32 : \ + (x) == 'W' ? 0x26 : \ + (x) == 'X' ? 0x18 : \ + (x) == 'Y' ? 0x19 : \ + (x) == 'Z' ? 0x3F : \ + (x) == '[' ? 0x27 : \ + (x) == '\\'? 0x1C : \ + (x) == ']' ? 0x1D : \ + (x) == '^' ? 0x1E : \ + (x) == '_' ? 0x1F : \ + 0) +# define UNCTRL(x) (ebcdic_unctrl(x)) +# define ISCTRL(x) (ORD(x) < 64 || ORD(x) == 0xFF) +#else +# define CTRL(x) ((x) == '?' ? 0x7F : (x) & 0x1F) +# define CCTRL(x) CTRL(x) +# define UNCTRL(x) ((x) ^ 0x40) +# define ISCTRL(x) (((signed char)((uint8_t)(x) + 1)) < 33) +#endif #define IDENT 64 @@ -1884,6 +1993,12 @@ char *strndup_i(const char *, size_t, Area *); #endif int unbksl(bool, int (*)(void), void (*)(int)); +#ifdef MKSH_EBCDIC +void initebcdic(void); +int ebcdic_ctrl(int); +int ebcdic_unctrl(int); +int ebcdic_isctrl(int); +#endif /* shf.c */ struct shf *shf_open(const char *, int, int, int); struct shf *shf_fdopen(int, int, struct shf *); Index: var.c =================================================================== RCS file: /cvs/src/bin/mksh/var.c,v retrieving revision 1.190 diff -u -r1.190 var.c --- var.c 19 Apr 2015 18:51:02 -0000 1.190 +++ var.c 24 Apr 2015 02:12:26 -0000 @@ -510,7 +510,7 @@ } if (c == '0' && arith) { - if ((s[0] | 0x20) == 'x') { + if (s[0] == 'x' || s[0] == 'X') { /* interpret as hexadecimal */ base = 16; ++s; @@ -553,12 +553,12 @@ continue; } if (ksh_isdigit(c)) - c -= '0'; + c -= ORD('0'); else { - c |= 0x20; + c = ksh_tolower(c); if (!ksh_islower(c)) return (-1); - c -= 'a' - 10; + c -= ORD('a') - 10; } if (c >= base) return (-1);
build.txt.gz
Description: application/gzip
test.txt.gz
Description: application/gzip