Hello list, I'd like to submit some changes that add support for IBM z/OS mainframe systems (specifically, for mksh running in the OMVS Unix environment), including compatibility with EBCDIC.
The test suite tallies up as follows in an EBCDIC run:
Total failed: 52 (4 ignored) (48 unexpected)
Total passed: 446
This work is not complete, then, as there are many ASCII'isms not yet
conditionalized in the code. Primarily, EBCDIC has the normal
[0-9A-Za-z] characters beyond 0x80, so it is not possible to set the high
bit for signalling purposes---which mksh seems to do a lot of.
Addressing this will require greater familiarity with the program than I
can muster for a few days' work.
The following files are attached:
1. Patch against current CVS
2. Output of running Build.sh, gzip'ed
3. Output of running test.sh, gzip'ed
Below is a walk-through of the changes in the patch. I'll elide most
things that are self-explanatory or repeated:
+++ Build.sh
* Added clauses for TARGET_OS == "OS/390"
* '\012\015' != '\n\r' on this platform, so use the latter
* When compiling with -g, xlc produces a .dbg file alongside each object
file, so clean those up
* NSIG is, amazingly, not #defined on this platform. Sure would be nice
if the fancy logic that calculates NSIG could conditionally #define
it, rather than a TARGET_OS conditional... :-)
* Check whether "nroff -c" is supported---the system I'm using has GNU
nroff 1.17, which doesn't have -c
* On this platform, xlc -qflag=... takes only one suboption, not two
* Some special flags are needed for xlc on z/OS that are not needed on
AIX, like to make missing #include files an error instead of a warning
(!). Conversely, most of those AIX xlc flags are not recognized
* Added a note that EBCDIC has \047 as the escape character
rather than \033
+++ check.pl
* I was getting a parse error with an expected-exit value of "e != 0",
and adding \d to this regex fixed things... this wasn't breaking for
other folks?
+++ check.t
* The "cd-pe" test fails on this system (perhaps it should be disabled?)
and the directories were not getting cleaned up properly
+++ sh.h (out of patch order, as other files use these changes)
* Wall-o'-text about EBCDIC for the newbies
* Added logic to detect and enable MKSH_EBCDIC automatically. Here I
rely on detecting the IBM compiler and __CHARSET_LIB; an alternate
(if more verbose) approach is the way C_CTYPE_ASCII is defined in
https://github.com/c9/node-gnu-tools/blob/master/grep-src/lib/c-ctype.h
* If compiling in ASCII mode, #define _ENHANCED_ASCII_EXT so that as
many C/system calls are switched to ASCII as possible (this is
something I was experimenting with, but it's not how most people would
be building/using mksh on this system)
* Define symbols for some common character/string escape literals so we
can swap them out easily
* Because EBCDIC characters like 'A' will have a negative value if
signed chars are being used, #define the ORD() macro so we can always
get an integer value in [0, 255]
* Added EBCDIC-compatible versions of the ksh_isXXXXX() and
ksh_toXXXXX() macros
* Define EBCDIC-compatible versions of the CTRL(), UNCTRL() and ISCTRL()
macros. Because the C0 control characters are mapped directly from
ASCII printable characters, I figured a full-nine-yards translation
table was needed for CTRL() and UNCTRL(), and so those are implemented
as functions for speed.
Because a function cannot be used in a switch statement nor (for xlc)
array initialization, I added a new CCTRL() macro (the extra "C" is
for "constant") that contains an expression that can be evaluated at
compile time. To keep this macro at a reasonable size, however, it
only accepts one "slice" of ASCII characters---CCTRL('A') should be
used instead of CCTRL('a'), and so on.
I implemented ISCTRL() as a macro, and this definition is valid, but
there are many more control characters than there are ASCII C0 codes.
See the commented-out definition of ebcdic_isctrl() in misc.c for an
alternate implementation
* The ebcdic_*() functions should probably be renamed to better fit mksh
conventions
+++ edit.c (back to patch order)
* I don't understand exactly what is_mfs() is used for, but I'm pretty
sure we can't do the & 0x80 with EBCDIC (note that e.g. 'A' == 0xC1)
* Use CCTRL() instead of CTRL() in this array initialization. I did not
adjust the indentation of the adjacent lines to keep the patch size
down, but that will be desirable
* Don't know much about XFUNC_VALUE(), but that & 0x7F looks un-kosher
for EBCDIC
* Don't use lowercase letters in CCTRL()
+++ main.c
* Need to initialize the EBCDIC escape translation tables at startup
+++ misc.c
* IBM z/OS already has a function named cclass():
$ grep cclass /usr/include/*.h
/usr/include/collate.h: int cclass(char *, collel_t **);
The signature clash was causing the build to break
* Don't assume the A-Z alphabet is contiguous
* Relocated the comment about escape portability to sh.h
* Control-key mapping escape table implementation, including the bit of
Perl I used to generate the actual mapping code. The control code for
'?' is handled as a special case (but could be incorporated into the
Perl if desired)
* Note the commented-out ebcdic_isctrl() function. This may or may not
be preferable to the EBCDIC ISCTRL() macro currently in sh.h. Be
aware that this function will return true for a lot fewer inputs
than the macro
+++ var.c
* Check for upper/lowercase 'X' without resorting to ASCII trickery
* Use the ORD() macro so that these subtractions don't inadvertently
become additions
I will be happy to provide further testing and answer any questions
as needed.
--Daniel
P.S.: Please Cc: me in any replies, as I am not subscribed to this list.
--
Daniel Richard G. || [email protected]
My ASCII-art .sig got a bad case of Times New Roman.
Index: Build.sh
===================================================================
RCS file: /cvs/src/bin/mksh/Build.sh,v
retrieving revision 1.674
diff -u -r1.674 Build.sh
--- Build.sh 19 Apr 2015 18:50:59 -0000 1.674
+++ Build.sh 24 Apr 2015 02:12:22 -0000
@@ -419,7 +419,11 @@
na=0
fi
hf=$1; shift
- hv=`echo "$hf" | tr -d '\012\015' | tr -c $alll$allu$alln $alls`
+ case "$TARGET_OS" in
+ OS/390) lfcr='\n\r' ;; # EBCDIC goofiness
+ *) lfcr='\012\015' ;;
+ esac
+ hv=`echo "$hf" | tr -d "$lfcr" | tr -c $alll$allu$alln $alls`
echo "/* NeXTstep bug workaround */" >x
for i
do
@@ -577,7 +581,7 @@
echo "$me: Error: ./$tfn is a directory!" >&2
exit 1
fi
-rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.ll *.o *.gen \
+rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.dbg *.ll *.o *.gen \
Rebuild.sh signames.inc test.sh x vv.out
SRCS="lalloc.c eval.c exec.c expr.c funcs.c histrap.c jobs.c"
@@ -829,6 +833,12 @@
OpenBSD)
: ${HAVE_SETLOCALE_CTYPE=0}
;;
+OS/390)
+ SIZE=: # not available
+ add_cppflags -DNSIG=32
+ add_cppflags -D_ALL_SOURCE
+ oswarn="; EBCDIC support is incomplete"
+ ;;
OSF1)
HAVE_SIG_T=0 # incompatible
add_cppflags -D_OSF_SOURCE
@@ -929,6 +939,7 @@
: ${AWK=awk} ${CC=cc} ${NROFF=nroff} ${SIZE=size}
test 0 = $r && echo | $NROFF -v 2>&1 | grep GNU >/dev/null 2>&1 && \
+ echo | $NROFF -c >/dev/null 2>&1 && \
NROFF="$NROFF -c"
# this aids me in tracing FTBFSen without access to the buildd
@@ -1327,8 +1338,16 @@
DOWARN=-Wc,-we
;;
xlc)
- save_NOWARN=-qflag=i:e
- DOWARN=-qflag=i:i
+ case "$TARGET_OS" in
+ OS/390)
+ save_NOWARN=-qflag=e
+ DOWARN=-qflag=i
+ ;;
+ *)
+ save_NOWARN=-qflag=i:e
+ DOWARN=-qflag=i:i
+ ;;
+ esac
;;
*)
test x"$save_NOWARN" = x"" && save_NOWARN=-Wno-error
@@ -1493,10 +1512,25 @@
ac_flags 1 extansi -Xa
;;
xlc)
- ac_flags 1 rodata "-qro -qroconst -qroptr"
- ac_flags 1 rtcheck -qcheck=all
- #ac_flags 1 rtchkc -qextchk # reported broken
- ac_flags 1 wformat "-qformat=all -qformat=nozln"
+ case "$TARGET_OS" in
+ OS/390)
+ # On IBM z/OS, the following are warnings by default
+ # CCN3296: #include file <foo.h> not found.
+ # CCN3944: Attribute "__foo__" is not supported and is ignored.
+ # CCN3963: The attribute "foo" is not a valid variable
+ # attribute and is ignored.
+ ac_flags 1 halton "-qhaltonmsg=CCN3296 -qhaltonmsg=CCN3944 -qhaltonmsg=CCN3963"
+ # CCN3290: Unknown macro name FOO on #undef directive.
+ # CCN4108: The use of keyword '__attribute__' is non-portable.
+ ac_flags 1 supprss "-qsuppress=CCN3290 -qsuppress=CCN4108"
+ ;;
+ *)
+ ac_flags 1 rodata "-qro -qroconst -qroptr"
+ ac_flags 1 rtcheck -qcheck=all
+ #ac_flags 1 rtchkc -qextchk # reported broken
+ ac_flags 1 wformat "-qformat=all -qformat=nozln"
+ ;;
+ esac
#ac_flags 1 wp64 -qwarn64 # too verbose for now
;;
esac
@@ -2628,8 +2662,8 @@
MKSH_ASSUME_UTF8 (0=disabled, 1=enabled; default: unset)
MKSH_BINSHPOSIX if */sh or */-sh, enable set -o posix
MKSH_BINSHREDUCED if */sh or */-sh, enable set -o sh
-MKSH_CLRTOEOL_STRING "\033[K"
-MKSH_CLS_STRING "\033[;H\033[J"
+MKSH_CLRTOEOL_STRING "\033[K" (replace \033 with \047 on EBCDIC)
+MKSH_CLS_STRING "\033[;H\033[J" (likewise)
MKSH_CONSERVATIVE_FDS fd 0-9 for scripts, shell only up to 31
MKSH_DEFAULT_EXECSHELL "/bin/sh" (do not change)
MKSH_DEFAULT_PROFILEDIR "/etc" (do not change)
Index: check.pl
===================================================================
RCS file: /cvs/src/bin/mksh/check.pl,v
retrieving revision 1.38
diff -u -r1.38 check.pl
--- check.pl 8 Mar 2015 22:54:55 -0000 1.38
+++ check.pl 24 Apr 2015 02:12:22 -0000
@@ -1165,7 +1165,7 @@
print STDERR "$prog:$test{':long-name'}: expected-exit value $val not in 0..255\n";
return undef;
}
- } elsif ($val !~ /^([\s<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) {
+ } elsif ($val !~ /^([\s\d<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) {
print STDERR "$prog:$test{':long-name'}: bad expected-exit expression: $val\n";
return undef;
}
Index: check.t
===================================================================
RCS file: /cvs/src/bin/mksh/check.t,v
retrieving revision 1.690
diff -u -r1.690 check.t
--- check.t 19 Apr 2015 19:18:27 -0000 1.690
+++ check.t 24 Apr 2015 02:12:24 -0000
@@ -1216,7 +1216,7 @@
cd -P$1 subdir
echo 2=$?,${PWD#$bwd/}
cd $bwd
- chmod 755 renamed
+ chmod 755 noread renamed 2>/dev/null
rm -rf noread link renamed
stdin:
export TSHELL="$__progname"
Index: edit.c
===================================================================
RCS file: /cvs/src/bin/mksh/edit.c,v
retrieving revision 1.284
diff -u -r1.284 edit.c
--- edit.c 11 Apr 2015 22:10:12 -0000 1.284
+++ edit.c 24 Apr 2015 02:12:25 -0000
@@ -37,10 +37,10 @@
* which do a full power cycle then...
*/
#ifndef MKSH_CLS_STRING
-#define MKSH_CLS_STRING "\033[;H\033[J"
+#define MKSH_CLS_STRING MKSH_ESCAPE_STRING "[;H" MKSH_ESCAPE_STRING "[J"
#endif
#ifndef MKSH_CLRTOEOL_STRING
-#define MKSH_CLRTOEOL_STRING "\033[K"
+#define MKSH_CLRTOEOL_STRING MKSH_ESCAPE_STRING "[K"
#endif
/* tty driver characters we are interested in */
@@ -885,7 +885,11 @@
/* Separator for completion */
#define is_cfs(c) ((c) == ' ' || (c) == '\t' || (c) == '"' || (c) == '\'')
/* Separator for motion */
-#define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80)))
+#ifdef MKSH_EBCDIC
+# define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$'))
+#else
+# define is_mfs(c) (!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80)))
+#endif
#define X_NTABS 3 /* normal, meta1, meta2 */
#define X_TABSZ 256 /* size of keydef tables etc */
@@ -1010,56 +1014,56 @@
};
static struct x_defbindings const x_defbindings[] = {
- { XFUNC_del_back, 0, CTRL('?') },
- { XFUNC_del_bword, 1, CTRL('?') },
- { XFUNC_eot_del, 0, CTRL('D') },
- { XFUNC_del_back, 0, CTRL('H') },
- { XFUNC_del_bword, 1, CTRL('H') },
+ { XFUNC_del_back, 0, CCTRL('?') },
+ { XFUNC_del_bword, 1, CCTRL('?') },
+ { XFUNC_eot_del, 0, CCTRL('D') },
+ { XFUNC_del_back, 0, CCTRL('H') },
+ { XFUNC_del_bword, 1, CCTRL('H') },
{ XFUNC_del_bword, 1, 'h' },
{ XFUNC_mv_bword, 1, 'b' },
{ XFUNC_mv_fword, 1, 'f' },
{ XFUNC_del_fword, 1, 'd' },
- { XFUNC_mv_back, 0, CTRL('B') },
- { XFUNC_mv_forw, 0, CTRL('F') },
- { XFUNC_search_char_forw, 0, CTRL(']') },
- { XFUNC_search_char_back, 1, CTRL(']') },
- { XFUNC_newline, 0, CTRL('M') },
- { XFUNC_newline, 0, CTRL('J') },
- { XFUNC_end_of_text, 0, CTRL('_') },
- { XFUNC_abort, 0, CTRL('G') },
- { XFUNC_prev_com, 0, CTRL('P') },
- { XFUNC_next_com, 0, CTRL('N') },
- { XFUNC_nl_next_com, 0, CTRL('O') },
- { XFUNC_search_hist, 0, CTRL('R') },
+ { XFUNC_mv_back, 0, CCTRL('B') },
+ { XFUNC_mv_forw, 0, CCTRL('F') },
+ { XFUNC_search_char_forw, 0, CCTRL(']') },
+ { XFUNC_search_char_back, 1, CCTRL(']') },
+ { XFUNC_newline, 0, CCTRL('M') },
+ { XFUNC_newline, 0, CCTRL('J') },
+ { XFUNC_end_of_text, 0, CCTRL('_') },
+ { XFUNC_abort, 0, CCTRL('G') },
+ { XFUNC_prev_com, 0, CCTRL('P') },
+ { XFUNC_next_com, 0, CCTRL('N') },
+ { XFUNC_nl_next_com, 0, CCTRL('O') },
+ { XFUNC_search_hist, 0, CCTRL('R') },
{ XFUNC_beg_hist, 1, '<' },
{ XFUNC_end_hist, 1, '>' },
{ XFUNC_goto_hist, 1, 'g' },
- { XFUNC_mv_end, 0, CTRL('E') },
- { XFUNC_mv_begin, 0, CTRL('A') },
- { XFUNC_draw_line, 0, CTRL('L') },
- { XFUNC_cls, 1, CTRL('L') },
- { XFUNC_meta1, 0, CTRL('[') },
- { XFUNC_meta2, 0, CTRL('X') },
- { XFUNC_kill, 0, CTRL('K') },
- { XFUNC_yank, 0, CTRL('Y') },
+ { XFUNC_mv_end, 0, CCTRL('E') },
+ { XFUNC_mv_begin, 0, CCTRL('A') },
+ { XFUNC_draw_line, 0, CCTRL('L') },
+ { XFUNC_cls, 1, CCTRL('L') },
+ { XFUNC_meta1, 0, CCTRL('[') },
+ { XFUNC_meta2, 0, CCTRL('X') },
+ { XFUNC_kill, 0, CCTRL('K') },
+ { XFUNC_yank, 0, CCTRL('Y') },
{ XFUNC_meta_yank, 1, 'y' },
- { XFUNC_literal, 0, CTRL('^') },
+ { XFUNC_literal, 0, CCTRL('^') },
{ XFUNC_comment, 1, '#' },
- { XFUNC_transpose, 0, CTRL('T') },
- { XFUNC_complete, 1, CTRL('[') },
- { XFUNC_comp_list, 0, CTRL('I') },
+ { XFUNC_transpose, 0, CCTRL('T') },
+ { XFUNC_complete, 1, CCTRL('[') },
+ { XFUNC_comp_list, 0, CCTRL('I') },
{ XFUNC_comp_list, 1, '=' },
{ XFUNC_enumerate, 1, '?' },
{ XFUNC_expand, 1, '*' },
- { XFUNC_comp_file, 1, CTRL('X') },
- { XFUNC_comp_comm, 2, CTRL('[') },
+ { XFUNC_comp_file, 1, CCTRL('X') },
+ { XFUNC_comp_comm, 2, CCTRL('[') },
{ XFUNC_list_comm, 2, '?' },
- { XFUNC_list_file, 2, CTRL('Y') },
+ { XFUNC_list_file, 2, CCTRL('Y') },
{ XFUNC_set_mark, 1, ' ' },
- { XFUNC_kill_region, 0, CTRL('W') },
- { XFUNC_xchg_point_mark, 2, CTRL('X') },
- { XFUNC_literal, 0, CTRL('V') },
- { XFUNC_version, 1, CTRL('V') },
+ { XFUNC_kill_region, 0, CCTRL('W') },
+ { XFUNC_xchg_point_mark, 2, CCTRL('X') },
+ { XFUNC_literal, 0, CCTRL('V') },
+ { XFUNC_version, 1, CCTRL('V') },
{ XFUNC_prev_histword, 1, '.' },
{ XFUNC_prev_histword, 1, '_' },
{ XFUNC_set_arg, 1, '0' },
@@ -1126,7 +1130,7 @@
}
}
-#ifdef MKSH_SMALL
+#if defined(MKSH_SMALL) || defined(MKSH_EBCDIC)
#define XFUNC_VALUE(f) (f)
#else
#define XFUNC_VALUE(f) (f & 0x7F)
@@ -2885,7 +2889,7 @@
} else
x_putc(c);
switch (c) {
- case 7:
+ case MKSH_BELL_CHAR:
break;
case '\r':
case '\n':
@@ -2921,7 +2925,7 @@
x_putc(c);
}
switch (c) {
- case 7:
+ case MKSH_BELL_CHAR:
break;
case '\r':
case '\n':
@@ -3976,7 +3980,7 @@
case '\n':
return (1);
- case CTRL('['):
+ case CCTRL('['):
expanded = NONE;
if (first_insert) {
first_insert = false;
@@ -3994,19 +3998,19 @@
return (redo_insert(lastac - 1));
/* { Begin nonstandard vi commands */
- case CTRL('x'):
+ case CCTRL('X'):
expand_word(0);
break;
- case CTRL('f'):
+ case CCTRL('F'):
complete_word(0, 0);
break;
- case CTRL('e'):
+ case CCTRL('E'):
print_expansions(es, 0);
break;
- case CTRL('i'):
+ case CCTRL('I'):
if (Flag(FVITABCOMPLETE)) {
complete_word(0, 0);
break;
@@ -4061,8 +4065,8 @@
}
switch (*cmd) {
- case CTRL('l'):
- case CTRL('r'):
+ case CCTRL('L'):
+ case CCTRL('R'):
redraw_line(true);
break;
@@ -4249,7 +4253,7 @@
case 'j':
case '+':
- case CTRL('n'):
+ case CCTRL('N'):
if (grabhist(modified, hnum + argcnt) < 0)
return (-1);
else {
@@ -4260,7 +4264,7 @@
case 'k':
case '-':
- case CTRL('p'):
+ case CCTRL('P'):
if (grabhist(modified, hnum - argcnt) < 0)
return (-1);
else {
@@ -4492,26 +4496,26 @@
/* AT&T ksh */
case '=':
/* Nonstandard vi/ksh */
- case CTRL('e'):
+ case CCTRL('E'):
print_expansions(es, 1);
break;
/* Nonstandard vi/ksh */
- case CTRL('i'):
+ case CCTRL('I'):
if (!Flag(FVITABCOMPLETE))
return (-1);
complete_word(1, argcnt);
break;
/* some annoying AT&T kshs */
- case CTRL('['):
+ case CCTRL('['):
if (!Flag(FVIESCCOMPLETE))
return (-1);
/* AT&T ksh */
case '\\':
/* Nonstandard vi/ksh */
- case CTRL('f'):
+ case CCTRL('F'):
complete_word(1, argcnt);
break;
@@ -4519,7 +4523,7 @@
/* AT&T ksh */
case '*':
/* Nonstandard vi/ksh */
- case CTRL('x'):
+ case CCTRL('X'):
expand_word(1);
break;
@@ -4598,7 +4602,7 @@
break;
case 'h':
- case CTRL('h'):
+ case CCTRL('H'):
if (!sub && es->cursor == 0)
return (-1);
ncursor = es->cursor - argcnt;
Index: main.c
===================================================================
RCS file: /cvs/src/bin/mksh/main.c,v
retrieving revision 1.292
diff -u -r1.292 main.c
--- main.c 19 Apr 2015 18:51:01 -0000 1.292
+++ main.c 24 Apr 2015 02:12:25 -0000
@@ -282,6 +282,10 @@
initctypes();
+#ifdef MKSH_EBCDIC
+ initebcdic();
+#endif
+
inittraps();
coproc_init();
Index: misc.c
===================================================================
RCS file: /cvs/src/bin/mksh/misc.c,v
retrieving revision 1.226
diff -u -r1.226 misc.c
--- misc.c 20 Mar 2015 21:47:04 -0000 1.226
+++ misc.c 24 Apr 2015 02:12:25 -0000
@@ -52,7 +52,7 @@
const unsigned char *, bool) MKSH_A_PURE;
static int do_gmatch(const unsigned char *, const unsigned char *,
const unsigned char *, const unsigned char *) MKSH_A_PURE;
-static const unsigned char *cclass(const unsigned char *, unsigned char)
+static const unsigned char *mksh_cclass(const unsigned char *, unsigned char)
MKSH_A_PURE;
#ifdef KSH_CHVT_CODE
static void chvt(const Getopt *);
@@ -93,12 +93,8 @@
void
initctypes(void)
{
- int c;
-
- for (c = 'a'; c <= 'z'; c++)
- chtypes[c] |= C_ALPHA;
- for (c = 'A'; c <= 'Z'; c++)
- chtypes[c] |= C_ALPHA;
+ setctypes("abcdefghijklmnopqrstuvwxyz", C_ALPHA);
+ setctypes("ABCDEFGHIJKLMNOPQRSTUVWXYZ", C_ALPHA);
chtypes['_'] |= C_ALPHA;
setctypes("0123456789", C_DIGIT);
/* \0 added automatically */
@@ -776,7 +772,7 @@
}
switch (*p++) {
case '[':
- if (sc == 0 || (p = cclass(p, sc)) == NULL)
+ if (sc == 0 || (p = mksh_cclass(p, sc)) == NULL)
return (0);
break;
@@ -889,7 +885,7 @@
}
static const unsigned char *
-cclass(const unsigned char *p, unsigned char sub)
+mksh_cclass(const unsigned char *p, unsigned char sub)
{
unsigned char c, d;
bool notp, found = false;
@@ -1159,7 +1155,7 @@
++p;
switch (c) {
/* see unbksl() in this file for comments */
- case 7:
+ case MKSH_BELL_CHAR:
c = 'a';
if (0)
/* FALLTHROUGH */
@@ -1183,11 +1179,11 @@
c = 't';
if (0)
/* FALLTHROUGH */
- case 11:
+ case MKSH_VTAB_CHAR:
c = 'v';
if (0)
/* FALLTHROUGH */
- case '\033':
+ case MKSH_ESCAPE_CHAR:
/* take E not e because \e is \ in *roff */
c = 'E';
/* FALLTHROUGH */
@@ -1197,7 +1193,11 @@
if (0)
/* FALLTHROUGH */
default:
+#ifdef MKSH_EBCDIC
+ if (c < 64 || c == 0xFF) {
+#else
if (c < 32 || c > 0x7E) {
+#endif
/* FALLTHROUGH */
case '\'':
shf_fprintf(shf, "\\%03o", c);
@@ -2136,13 +2136,7 @@
fc = (*fg)();
switch (fc) {
case 'a':
- /*
- * according to the comments in pdksh, \007 seems
- * to be more portable than \a (due to HP-UX cc,
- * Ultrix cc, old pcc, etc.) so we avoid the escape
- * sequence altogether in mksh and assume ASCII
- */
- wc = 7;
+ wc = MKSH_BELL_CHAR;
break;
case 'b':
wc = '\b';
@@ -2155,7 +2149,7 @@
break;
case 'E':
case 'e':
- wc = 033;
+ wc = MKSH_ESCAPE_CHAR;
break;
case 'f':
wc = '\f';
@@ -2170,8 +2164,7 @@
wc = '\t';
break;
case 'v':
- /* assume ASCII here as well */
- wc = 11;
+ wc = MKSH_VTAB_CHAR;
break;
case '1':
case '2':
@@ -2253,3 +2246,144 @@
return (wc);
}
+
+#ifdef MKSH_EBCDIC
+
+/*
+ * The mapping of control keys to C0 control codes (e.g. Ctrl-D => EOT)
+ * is directly tied to ASCII, so for EBCDIC we have no option but to use
+ * conversion tables if we want to keep the same mapping.
+ *
+ * The mappings are generated with the following Perl code:
+ *
+--------------------------------
+#!/usr/bin/env perl
+
+use Text::Iconv;
+use strict;
+
+my $cvt = Text::Iconv->new("iso8859-1", "ibm-1047");
+
+sub fmtchr($)
+{
+ my $n = $_[0];
+ my $c = chr($n);
+ $c = "\\$c" if $c =~ /['\\]/;
+ $c = "\\177" if $n eq 0x7F;
+ return $c;
+}
+
+# generate ebcdic_ctrl_map[][] array
+for (my $i = 0x40; $i <= 0x5F; $i++)
+{
+ my $a = chr($i & 0x1f);
+ my $e = $cvt->convert($a);
+ printf "\t{ '%s', 0x%02X }, { '%s', 0x%02X }, { '%s', 0x%02X },\n",
+ fmtchr($i), ord($e),
+ fmtchr($i - 0x20), ord($e),
+ fmtchr($i + 0x20), ord($e);
+}
+
+# generate CCTRL() macro
+for (my $i = 0x41; $i <= 0x5F; $i++)
+{
+ my $a = chr($i & 0x1f);
+ my $e = $cvt->convert($a);
+ printf "\t\t\t (x) == '%s' ? 0x%02X : \\\n", fmtchr($i), ord($e);
+}
+--------------------------------
+ *
+ */
+
+static const int ebcdic_ctrl_map[][2] = {
+ /*
+ * ebcdic_ctrl_map[x][0] = ASCII control key
+ * ebcdic_ctrl_map[x][1] = EBCDIC control code
+ *
+ * Example: Ctrl-D -> 'D' -> ASCII 0x04 -> EOT -> EBCDIC 0x37
+ *
+ * Column 1 represents the standard ASCII control keys
+ * Column 2 is the col. 1 set shifted minus 0x20
+ * Column 3 is the col. 1 set shifted plus 0x20
+ *
+ * Note: '?' is special-cased below
+ */
+ { '@', 0x00 }, { ' ', 0x00 }, { '`', 0x00 },
+ { 'A', 0x01 }, { '!', 0x01 }, { 'a', 0x01 },
+ { 'B', 0x02 }, { '"', 0x02 }, { 'b', 0x02 },
+ { 'C', 0x03 }, { '#', 0x03 }, { 'c', 0x03 },
+ { 'D', 0x37 }, { '$', 0x37 }, { 'd', 0x37 },
+ { 'E', 0x2D }, { '%', 0x2D }, { 'e', 0x2D },
+ { 'F', 0x2E }, { '&', 0x2E }, { 'f', 0x2E },
+ { 'G', 0x2F }, { '\'',0x2F }, { 'g', 0x2F },
+ { 'H', 0x16 }, { '(', 0x16 }, { 'h', 0x16 },
+ { 'I', 0x05 }, { ')', 0x05 }, { 'i', 0x05 },
+ { 'J', 0x25 }, { '*', 0x25 }, { 'j', 0x25 },
+ { 'K', 0x0B }, { '+', 0x0B }, { 'k', 0x0B },
+ { 'L', 0x0C }, { ',', 0x0C }, { 'l', 0x0C },
+ { 'M', 0x0D }, { '-', 0x0D }, { 'm', 0x0D },
+ { 'N', 0x0E }, { '.', 0x0E }, { 'n', 0x0E },
+ { 'O', 0x0F }, { '/', 0x0F }, { 'o', 0x0F },
+ { 'P', 0x10 }, { '0', 0x10 }, { 'p', 0x10 },
+ { 'Q', 0x11 }, { '1', 0x11 }, { 'q', 0x11 },
+ { 'R', 0x12 }, { '2', 0x12 }, { 'r', 0x12 },
+ { 'S', 0x13 }, { '3', 0x13 }, { 's', 0x13 },
+ { 'T', 0x3C }, { '4', 0x3C }, { 't', 0x3C },
+ { 'U', 0x3D }, { '5', 0x3D }, { 'u', 0x3D },
+ { 'V', 0x32 }, { '6', 0x32 }, { 'v', 0x32 },
+ { 'W', 0x26 }, { '7', 0x26 }, { 'w', 0x26 },
+ { 'X', 0x18 }, { '8', 0x18 }, { 'x', 0x18 },
+ { 'Y', 0x19 }, { '9', 0x19 }, { 'y', 0x19 },
+ { 'Z', 0x3F }, { ':', 0x3F }, { 'z', 0x3F },
+ { '[', 0x27 }, { ';', 0x27 }, { '{', 0x27 },
+ { '\\',0x1C }, { '<', 0x1C }, { '|', 0x1C },
+ { ']', 0x1D }, { '=', 0x1D }, { '}', 0x1D },
+ { '^', 0x1E }, { '>', 0x1E }, { '~', 0x1E },
+ { '_', 0x1F }, { '?', 0x1F }, { '\177', 0x1F },
+ { '\0', 0 }
+};
+
+static int ebcdic_ctrl_table[UCHAR_MAX] = { 0 };
+static int ebcdic_unctrl_table[UCHAR_MAX] = { 0 };
+
+void
+initebcdic(void)
+{
+ int i;
+
+ for (i = 0; i < UCHAR_MAX; i++) {
+ ebcdic_ctrl_table[i] = 0;
+ ebcdic_unctrl_table[i] = 0;
+ }
+
+ for (i = 0; ebcdic_ctrl_map[i][0]; i++) {
+ int ch = ORD(ebcdic_ctrl_map[i][0]);
+ int cc = ebcdic_ctrl_map[i][1];
+ ebcdic_ctrl_table[ch] = cc;
+ if (ch % 3 == 0)
+ ebcdic_unctrl_table[cc] = ch;
+ }
+
+ /* special case */
+ ebcdic_ctrl_table[ORD('?')] = 0x07;
+ ebcdic_unctrl_table[0x07] = '?';
+}
+
+int ebcdic_ctrl(int c)
+{
+ return ebcdic_ctrl_table[ORD(c)];
+}
+
+int ebcdic_unctrl(int c)
+{
+ return ebcdic_unctrl_table[ORD(c)];
+}
+
+/*
+int ebcdic_isctrl(int c)
+{
+ return c == 0 || ebcdic_unctrl_table[ORD(c)] != 0;
+}
+*/
+
+#endif /* MKSH_EBCDIC */
Index: sh.h
===================================================================
RCS file: /cvs/src/bin/mksh/sh.h,v
retrieving revision 1.725
diff -u -r1.725 sh.h
--- sh.h 19 Apr 2015 19:18:31 -0000 1.725
+++ sh.h 24 Apr 2015 02:12:26 -0000
@@ -251,6 +251,64 @@
#ifndef MKSH_INCLUDES_ONLY
+/*
+ * Many headaches with EBCDIC:
+ * 1. There are numerous EBCDIC variants, and it is not feasible for us
+ * to support them all. But we can support the EBCDIC code pages that
+ * contain all (most?) of the characters in ASCII, and these
+ * thankfully tend to agree on the code points assigned to the ASCII
+ * subset. If you need a representative example, look at EBCDIC 1047,
+ * which is first among equals in the IBM MVS development
+ * environment: http://en.wikipedia.org/wiki/EBCDIC_1047
+ * 2. Character ranges that are contiguous in ASCII, like the letters
+ * in [A-Z], are broken up into segments (i.e. [A-IJ-RS-Z]), so we
+ * can't implement e.g. islower() as { return c >= 'a' && c <= 'z'; }
+ * because it will also return true for a handful of extraneous
+ * characters (like the plus-minus sign at 0x8F in EBCDIC 1047, a
+ * little after 'i'). But at least '_' is not one of these.
+ * 3. The normal [0-9A-Za-z] characters are at codepoints beyond 0x80.
+ * Not only do they require all 8 bits instead of 7, if chars are
+ * signed, they will have negative integer values! Something like
+ * (c - 'A') could actually become (c + 63)! Use the ORD() macro to
+ * ensure you're getting a value in [0, 255].
+ * 4. '\n' is actually NL (0x15, U+0085) instead of LF (0x25, U+000A).
+ * EBCDIC has a proper newline character instead of "emulating" one
+ * with line feeds.
+ * 5. Note that it is possible to compile programs in ASCII mode on IBM
+ * mainframe systems, using the -qascii option to the XL C compiler.
+ * We can determine the build mode by looking at __CHARSET_LIB:
+ * 0 == EBCDIC, 1 == ASCII
+ */
+#if defined(__MVS__) && defined(__IBMC__) && !defined(MKSH_EBCDIC)
+# if defined(__CHARSET_LIB) && __CHARSET_LIB
+# ifndef _ENHANCED_ASCII_EXT
+# define _ENHANCED_ASCII_EXT 0xFFFFFFFF /* go all-out on ASCII */
+# endif
+# else
+# define MKSH_EBCDIC
+# endif
+#endif
+
+#ifdef MKSH_EBCDIC
+/*
+ * use symbolic escapes when possible, let the compiler sort it out
+ */
+# define MKSH_BELL_CHAR '\a'
+# define MKSH_ESCAPE_CHAR '\047'
+# define MKSH_ESCAPE_STRING "\047"
+# define MKSH_VTAB_CHAR '\v'
+#else
+/*
+ * according to the comments in pdksh, \007 seems to be more portable
+ * than \a (due to HP-UX cc, Ultrix cc, old pcc, etc.) so we avoid the
+ * latter altogether when we're using ASCII
+ */
+# define MKSH_BELL_CHAR '\007'
+# define MKSH_ESCAPE_CHAR '\033'
+# define MKSH_ESCAPE_STRING "\033"
+# define MKSH_VTAB_CHAR '\013'
+#endif
+
/* extra types */
#if !HAVE_GETRUSAGE
@@ -298,13 +356,24 @@
} while (/* CONSTCOND */ 0)
#endif
-#define ksh_isdigit(c) (((c) >= '0') && ((c) <= '9'))
-#define ksh_islower(c) (((c) >= 'a') && ((c) <= 'z'))
-#define ksh_isupper(c) (((c) >= 'A') && ((c) <= 'Z'))
-#define ksh_tolower(c) (((c) >= 'A') && ((c) <= 'Z') ? (c) - 'A' + 'a' : (c))
-#define ksh_toupper(c) (((c) >= 'a') && ((c) <= 'z') ? (c) - 'a' + 'A' : (c))
+#define ORD(c) ((int)(unsigned char)(c))
+#ifdef MKSH_EBCDIC
+# define ksh_isdigit(c) (ORD(c) >= ORD('0') && ORD(c) <= ORD('9'))
+# define ksh_islower(c) (ORD(c) >= ORD('a') && ORD(c) <= ORD('z') && \
+ ksh_isalphx((c)))
+# define ksh_isupper(c) (ORD(c) >= ORD('A') && ORD(c) <= ORD('Z') && \
+ ksh_isalphx((c)))
+# define ksh_isspace(c) ((c) == '\t' || (c) == '\n' || (c) == '\v' || \
+ (c) == '\f' || (c) == '\r' || (c) == ' ')
+#else
+# define ksh_isdigit(c) (((c) >= '0') && ((c) <= '9'))
+# define ksh_islower(c) (((c) >= 'a') && ((c) <= 'z'))
+# define ksh_isupper(c) (((c) >= 'A') && ((c) <= 'Z'))
+# define ksh_isspace(c) ((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20))
+#endif
+#define ksh_tolower(c) (ksh_isupper((c)) ? (c) - 'A' + 'a' : (c))
+#define ksh_toupper(c) (ksh_islower((c)) ? (c) - 'a' + 'A' : (c))
#define ksh_isdash(s) (((s)[0] == '-') && ((s)[1] == '\0'))
-#define ksh_isspace(c) ((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20))
#define ksh_min(x,y) ((x) < (y) ? (x) : (y))
#define ksh_max(x,y) ((x) > (y) ? (x) : (y))
@@ -460,7 +529,7 @@
* not a char that is used often. Also, can't use the high bit as it causes
* portability problems (calling strchr(x, 0x80 | 'x') is error prone).
*/
-#define MAGIC (7) /* prefix for *?[!{,} during expand */
+#define MAGIC (MKSH_BELL_CHAR) /* prefix for *?[!{,} during expand */
#define ISMAGIC(c) ((unsigned char)(c) == MAGIC)
EXTERN const char *safe_prompt; /* safe prompt if PS1 substitution fails */
@@ -1583,9 +1652,49 @@
#define HERES 10 /* max number of << in line */
#undef CTRL
-#define CTRL(x) ((x) == '?' ? 0x7F : (x) & 0x1F) /* ASCII */
-#define UNCTRL(x) ((x) ^ 0x40) /* ASCII */
-#define ISCTRL(x) (((signed char)((uint8_t)(x) + 1)) < 33)
+#ifdef MKSH_EBCDIC
+# define CTRL(x) (ebcdic_ctrl(x))
+# define CCTRL(x) ((x) == '?' ? 0x07 : /* special case */ \
+ (x) == 'A' ? 0x01 : \
+ (x) == 'B' ? 0x02 : \
+ (x) == 'C' ? 0x03 : \
+ (x) == 'D' ? 0x37 : \
+ (x) == 'E' ? 0x2D : \
+ (x) == 'F' ? 0x2E : \
+ (x) == 'G' ? 0x2F : \
+ (x) == 'H' ? 0x16 : \
+ (x) == 'I' ? 0x05 : \
+ (x) == 'J' ? 0x25 : \
+ (x) == 'K' ? 0x0B : \
+ (x) == 'L' ? 0x0C : \
+ (x) == 'M' ? 0x0D : \
+ (x) == 'N' ? 0x0E : \
+ (x) == 'O' ? 0x0F : \
+ (x) == 'P' ? 0x10 : \
+ (x) == 'Q' ? 0x11 : \
+ (x) == 'R' ? 0x12 : \
+ (x) == 'S' ? 0x13 : \
+ (x) == 'T' ? 0x3C : \
+ (x) == 'U' ? 0x3D : \
+ (x) == 'V' ? 0x32 : \
+ (x) == 'W' ? 0x26 : \
+ (x) == 'X' ? 0x18 : \
+ (x) == 'Y' ? 0x19 : \
+ (x) == 'Z' ? 0x3F : \
+ (x) == '[' ? 0x27 : \
+ (x) == '\\'? 0x1C : \
+ (x) == ']' ? 0x1D : \
+ (x) == '^' ? 0x1E : \
+ (x) == '_' ? 0x1F : \
+ 0)
+# define UNCTRL(x) (ebcdic_unctrl(x))
+# define ISCTRL(x) (ORD(x) < 64 || ORD(x) == 0xFF)
+#else
+# define CTRL(x) ((x) == '?' ? 0x7F : (x) & 0x1F)
+# define CCTRL(x) CTRL(x)
+# define UNCTRL(x) ((x) ^ 0x40)
+# define ISCTRL(x) (((signed char)((uint8_t)(x) + 1)) < 33)
+#endif
#define IDENT 64
@@ -1884,6 +1993,12 @@
char *strndup_i(const char *, size_t, Area *);
#endif
int unbksl(bool, int (*)(void), void (*)(int));
+#ifdef MKSH_EBCDIC
+void initebcdic(void);
+int ebcdic_ctrl(int);
+int ebcdic_unctrl(int);
+int ebcdic_isctrl(int);
+#endif
/* shf.c */
struct shf *shf_open(const char *, int, int, int);
struct shf *shf_fdopen(int, int, struct shf *);
Index: var.c
===================================================================
RCS file: /cvs/src/bin/mksh/var.c,v
retrieving revision 1.190
diff -u -r1.190 var.c
--- var.c 19 Apr 2015 18:51:02 -0000 1.190
+++ var.c 24 Apr 2015 02:12:26 -0000
@@ -510,7 +510,7 @@
}
if (c == '0' && arith) {
- if ((s[0] | 0x20) == 'x') {
+ if (s[0] == 'x' || s[0] == 'X') {
/* interpret as hexadecimal */
base = 16;
++s;
@@ -553,12 +553,12 @@
continue;
}
if (ksh_isdigit(c))
- c -= '0';
+ c -= ORD('0');
else {
- c |= 0x20;
+ c = ksh_tolower(c);
if (!ksh_islower(c))
return (-1);
- c -= 'a' - 10;
+ c -= ORD('a') - 10;
}
if (c >= base)
return (-1);
build.txt.gz
Description: application/gzip
test.txt.gz
Description: application/gzip
