[PATCH] IBM z/OS + EBCDIC support

Daniel Richard G. Thu, 23 Apr 2015 21:03:07 -0700

Hello list,

I'd like to submit some changes that add support for IBM z/OS mainframe
systems (specifically, for mksh running in the OMVS Unix environment),
including compatibility with EBCDIC.


The test suite tallies up as follows in an EBCDIC run:

    Total failed: 52 (4 ignored) (48 unexpected)
    Total passed: 446

This work is not complete, then, as there are many ASCII'isms not yet
conditionalized in the code. Primarily, EBCDIC has the normal
[0-9A-Za-z] characters beyond 0x80, so it is not possible to set the high
bit for signalling purposes---which mksh seems to do a lot of.
Addressing this will require greater familiarity with the program than I
can muster for a few days' work.

The following files are attached:

1. Patch against current CVS

2. Output of running Build.sh, gzip'ed

3. Output of running test.sh, gzip'ed


Below is a walk-through of the changes in the patch. I'll elide most
things that are self-explanatory or repeated:

+++ Build.sh

* Added clauses for TARGET_OS == "OS/390"

* '\012\015' != '\n\r' on this platform, so use the latter

* When compiling with -g, xlc produces a .dbg file alongside each object
  file, so clean those up

* NSIG is, amazingly, not #defined on this platform. Sure would be nice
  if the fancy logic that calculates NSIG could conditionally #define
  it, rather than a TARGET_OS conditional... :-)

* Check whether "nroff -c" is supported---the system I'm using has GNU
  nroff 1.17, which doesn't have -c

* On this platform, xlc -qflag=... takes only one suboption, not two

* Some special flags are needed for xlc on z/OS that are not needed on
  AIX, like to make missing #include files an error instead of a warning
  (!). Conversely, most of those AIX xlc flags are not recognized

* Added a note that EBCDIC has \047 as the escape character
  rather than \033

+++ check.pl

* I was getting a parse error with an expected-exit value of "e != 0",
  and adding \d to this regex fixed things... this wasn't breaking for
  other folks?

+++ check.t

* The "cd-pe" test fails on this system (perhaps it should be disabled?)
  and the directories were not getting cleaned up properly

+++ sh.h (out of patch order, as other files use these changes)

* Wall-o'-text about EBCDIC for the newbies

* Added logic to detect and enable MKSH_EBCDIC automatically. Here I
  rely on detecting the IBM compiler and __CHARSET_LIB; an alternate
  (if more verbose) approach is the way C_CTYPE_ASCII is defined in
  https://github.com/c9/node-gnu-tools/blob/master/grep-src/lib/c-ctype.h

* If compiling in ASCII mode, #define _ENHANCED_ASCII_EXT so that as
  many C/system calls are switched to ASCII as possible (this is
  something I was experimenting with, but it's not how most people would
  be building/using mksh on this system)

* Define symbols for some common character/string escape literals so we
  can swap them out easily

* Because EBCDIC characters like 'A' will have a negative value if
  signed chars are being used, #define the ORD() macro so we can always
  get an integer value in [0, 255]

* Added EBCDIC-compatible versions of the ksh_isXXXXX() and
  ksh_toXXXXX() macros

* Define EBCDIC-compatible versions of the CTRL(), UNCTRL() and ISCTRL()
  macros. Because the C0 control characters are mapped directly from
  ASCII printable characters, I figured a full-nine-yards translation
  table was needed for CTRL() and UNCTRL(), and so those are implemented
  as functions for speed.

  Because a function cannot be used in a switch statement nor (for xlc)
  array initialization, I added a new CCTRL() macro (the extra "C" is
  for "constant") that contains an expression that can be evaluated at
  compile time. To keep this macro at a reasonable size, however, it
  only accepts one "slice" of ASCII characters---CCTRL('A') should be
  used instead of CCTRL('a'), and so on.

  I implemented ISCTRL() as a macro, and this definition is valid, but
  there are many more control characters than there are ASCII C0 codes.
  See the commented-out definition of ebcdic_isctrl() in misc.c for an
  alternate implementation

* The ebcdic_*() functions should probably be renamed to better fit mksh
  conventions

+++ edit.c (back to patch order)

* I don't understand exactly what is_mfs() is used for, but I'm pretty
  sure we can't do the & 0x80 with EBCDIC (note that e.g. 'A' == 0xC1)

* Use CCTRL() instead of CTRL() in this array initialization. I did not
  adjust the indentation of the adjacent lines to keep the patch size
  down, but that will be desirable

* Don't know much about XFUNC_VALUE(), but that & 0x7F looks un-kosher
  for EBCDIC

* Don't use lowercase letters in CCTRL()

+++ main.c

* Need to initialize the EBCDIC escape translation tables at startup

+++ misc.c

* IBM z/OS already has a function named cclass():

    $ grep cclass /usr/include/*.h
    /usr/include/collate.h:   int      cclass(char *, collel_t **);

  The signature clash was causing the build to break

* Don't assume the A-Z alphabet is contiguous

* Relocated the comment about escape portability to sh.h

* Control-key mapping escape table implementation, including the bit of
  Perl I used to generate the actual mapping code. The control code for
  '?' is handled as a special case (but could be incorporated into the
  Perl if desired)

* Note the commented-out ebcdic_isctrl() function. This may or may not
  be preferable to the EBCDIC ISCTRL() macro currently in sh.h. Be
  aware that this function will return true for a lot fewer inputs
  than the macro

+++ var.c

* Check for upper/lowercase 'X' without resorting to ASCII trickery

* Use the ORD() macro so that these subtractions don't inadvertently
  become additions


I will be happy to provide further testing and answer any questions
as needed.


--Daniel


P.S.: Please Cc: me in any replies, as I am not subscribed to this list.

-- 
Daniel Richard G. || sk...@iskunk.org
My ASCII-art .sig got a bad case of Times New Roman.

Index: Build.sh
===================================================================
RCS file: /cvs/src/bin/mksh/Build.sh,v
retrieving revision 1.674
diff -u -r1.674 Build.sh
--- Build.sh	19 Apr 2015 18:50:59 -0000	1.674
+++ Build.sh	24 Apr 2015 02:12:22 -0000
@@ -419,7 +419,11 @@
 		na=0
 	fi
 	hf=$1; shift
-	hv=`echo "$hf" | tr -d '\012\015' | tr -c $alll$allu$alln $alls`
+	case "$TARGET_OS" in
+		OS/390) lfcr='\n\r' ;;	# EBCDIC goofiness
+		*) lfcr='\012\015' ;;
+	esac
+	hv=`echo "$hf" | tr -d "$lfcr" | tr -c $alll$allu$alln $alls`
 	echo "/* NeXTstep bug workaround */" >x
 	for i
 	do
@@ -577,7 +581,7 @@
 	echo "$me: Error: ./$tfn is a directory!" >&2
 	exit 1
 fi
-rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.ll *.o *.gen \
+rmf a.exe* a.out* conftest.c *core core.* lft ${tfn}* no *.bc *.dbg *.ll *.o *.gen \
     Rebuild.sh signames.inc test.sh x vv.out
 
 SRCS="lalloc.c eval.c exec.c expr.c funcs.c histrap.c jobs.c"
@@ -829,6 +833,12 @@
 OpenBSD)
 	: ${HAVE_SETLOCALE_CTYPE=0}
 	;;
+OS/390)
+	SIZE=:	# not available
+	add_cppflags -DNSIG=32
+	add_cppflags -D_ALL_SOURCE
+	oswarn="; EBCDIC support is incomplete"
+	;;
 OSF1)
 	HAVE_SIG_T=0	# incompatible
 	add_cppflags -D_OSF_SOURCE
@@ -929,6 +939,7 @@
 
 : ${AWK=awk} ${CC=cc} ${NROFF=nroff} ${SIZE=size}
 test 0 = $r && echo | $NROFF -v 2>&1 | grep GNU >/dev/null 2>&1 && \
+	echo | $NROFF -c >/dev/null 2>&1 && \
     NROFF="$NROFF -c"
 
 # this aids me in tracing FTBFSen without access to the buildd
@@ -1327,8 +1338,16 @@
 	DOWARN=-Wc,-we
 	;;
 xlc)
-	save_NOWARN=-qflag=i:e
-	DOWARN=-qflag=i:i
+	case "$TARGET_OS" in
+	OS/390)
+		save_NOWARN=-qflag=e
+		DOWARN=-qflag=i
+		;;
+	*)
+		save_NOWARN=-qflag=i:e
+		DOWARN=-qflag=i:i
+		;;
+	esac
 	;;
 *)
 	test x"$save_NOWARN" = x"" && save_NOWARN=-Wno-error
@@ -1493,10 +1512,25 @@
 	ac_flags 1 extansi -Xa
 	;;
 xlc)
-	ac_flags 1 rodata "-qro -qroconst -qroptr"
-	ac_flags 1 rtcheck -qcheck=all
-	#ac_flags 1 rtchkc -qextchk	# reported broken
-	ac_flags 1 wformat "-qformat=all -qformat=nozln"
+	case "$TARGET_OS" in
+	OS/390)
+		# On IBM z/OS, the following are warnings by default
+		# CCN3296: #include file <foo.h> not found.
+		# CCN3944: Attribute "__foo__" is not supported and is ignored.
+		# CCN3963: The attribute "foo" is not a valid variable
+		#          attribute and is ignored.
+		ac_flags 1 halton "-qhaltonmsg=CCN3296 -qhaltonmsg=CCN3944 -qhaltonmsg=CCN3963"
+		# CCN3290: Unknown macro name FOO on #undef directive.
+		# CCN4108: The use of keyword '__attribute__' is non-portable.
+		ac_flags 1 supprss "-qsuppress=CCN3290 -qsuppress=CCN4108"
+		;;
+	*)
+		ac_flags 1 rodata "-qro -qroconst -qroptr"
+		ac_flags 1 rtcheck -qcheck=all
+		#ac_flags 1 rtchkc -qextchk	# reported broken
+		ac_flags 1 wformat "-qformat=all -qformat=nozln"
+		;;
+	esac
 	#ac_flags 1 wp64 -qwarn64	# too verbose for now
 	;;
 esac
@@ -2628,8 +2662,8 @@
 MKSH_ASSUME_UTF8		(0=disabled, 1=enabled; default: unset)
 MKSH_BINSHPOSIX			if */sh or */-sh, enable set -o posix
 MKSH_BINSHREDUCED		if */sh or */-sh, enable set -o sh
-MKSH_CLRTOEOL_STRING		"\033[K"
-MKSH_CLS_STRING			"\033[;H\033[J"
+MKSH_CLRTOEOL_STRING		"\033[K" (replace \033 with \047 on EBCDIC)
+MKSH_CLS_STRING			"\033[;H\033[J" (likewise)
 MKSH_CONSERVATIVE_FDS		fd 0-9 for scripts, shell only up to 31
 MKSH_DEFAULT_EXECSHELL		"/bin/sh" (do not change)
 MKSH_DEFAULT_PROFILEDIR		"/etc" (do not change)
Index: check.pl
===================================================================
RCS file: /cvs/src/bin/mksh/check.pl,v
retrieving revision 1.38
diff -u -r1.38 check.pl
--- check.pl	8 Mar 2015 22:54:55 -0000	1.38
+++ check.pl	24 Apr 2015 02:12:22 -0000
@@ -1165,7 +1165,7 @@
 		print STDERR "$prog:$test{':long-name'}: expected-exit value $val not in 0..255\n";
 		return undef;
 	    }
-	} elsif ($val !~ /^([\s<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) {
+	} elsif ($val !~ /^([\s\d<>+-=*%\/&|!()]|\b[wse]\b|\bSIG[A-Z][A-Z0-9]*\b)+$/) {
 	    print STDERR "$prog:$test{':long-name'}: bad expected-exit expression: $val\n";
 	    return undef;
 	}
Index: check.t
===================================================================
RCS file: /cvs/src/bin/mksh/check.t,v
retrieving revision 1.690
diff -u -r1.690 check.t
--- check.t	19 Apr 2015 19:18:27 -0000	1.690
+++ check.t	24 Apr 2015 02:12:24 -0000
@@ -1216,7 +1216,7 @@
 	cd -P$1 subdir
 	echo 2=$?,${PWD#$bwd/}
 	cd $bwd
-	chmod 755 renamed
+	chmod 755 noread renamed 2>/dev/null
 	rm -rf noread link renamed
 stdin:
 	export TSHELL="$__progname"
Index: edit.c
===================================================================
RCS file: /cvs/src/bin/mksh/edit.c,v
retrieving revision 1.284
diff -u -r1.284 edit.c
--- edit.c	11 Apr 2015 22:10:12 -0000	1.284
+++ edit.c	24 Apr 2015 02:12:25 -0000
@@ -37,10 +37,10 @@
  * which do a full power cycle then...
  */
 #ifndef MKSH_CLS_STRING
-#define MKSH_CLS_STRING		"\033[;H\033[J"
+#define MKSH_CLS_STRING		MKSH_ESCAPE_STRING "[;H" MKSH_ESCAPE_STRING "[J"
 #endif
 #ifndef MKSH_CLRTOEOL_STRING
-#define MKSH_CLRTOEOL_STRING	"\033[K"
+#define MKSH_CLRTOEOL_STRING	MKSH_ESCAPE_STRING "[K"
 #endif
 
 /* tty driver characters we are interested in */
@@ -885,7 +885,11 @@
 /* Separator for completion */
 #define	is_cfs(c)	((c) == ' ' || (c) == '\t' || (c) == '"' || (c) == '\'')
 /* Separator for motion */
-#define	is_mfs(c)	(!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80)))
+#ifdef MKSH_EBCDIC
+# define is_mfs(c)	(!(ksh_isalnux(c) || (c) == '$'))
+#else
+# define is_mfs(c)	(!(ksh_isalnux(c) || (c) == '$' || ((c) & 0x80)))
+#endif
 
 #define X_NTABS		3			/* normal, meta1, meta2 */
 #define X_TABSZ		256			/* size of keydef tables etc */
@@ -1010,56 +1014,56 @@
 };
 
 static struct x_defbindings const x_defbindings[] = {
-	{ XFUNC_del_back,		0, CTRL('?')	},
-	{ XFUNC_del_bword,		1, CTRL('?')	},
-	{ XFUNC_eot_del,		0, CTRL('D')	},
-	{ XFUNC_del_back,		0, CTRL('H')	},
-	{ XFUNC_del_bword,		1, CTRL('H')	},
+	{ XFUNC_del_back,		0, CCTRL('?')	},
+	{ XFUNC_del_bword,		1, CCTRL('?')	},
+	{ XFUNC_eot_del,		0, CCTRL('D')	},
+	{ XFUNC_del_back,		0, CCTRL('H')	},
+	{ XFUNC_del_bword,		1, CCTRL('H')	},
 	{ XFUNC_del_bword,		1,	'h'	},
 	{ XFUNC_mv_bword,		1,	'b'	},
 	{ XFUNC_mv_fword,		1,	'f'	},
 	{ XFUNC_del_fword,		1,	'd'	},
-	{ XFUNC_mv_back,		0, CTRL('B')	},
-	{ XFUNC_mv_forw,		0, CTRL('F')	},
-	{ XFUNC_search_char_forw,	0, CTRL(']')	},
-	{ XFUNC_search_char_back,	1, CTRL(']')	},
-	{ XFUNC_newline,		0, CTRL('M')	},
-	{ XFUNC_newline,		0, CTRL('J')	},
-	{ XFUNC_end_of_text,		0, CTRL('_')	},
-	{ XFUNC_abort,			0, CTRL('G')	},
-	{ XFUNC_prev_com,		0, CTRL('P')	},
-	{ XFUNC_next_com,		0, CTRL('N')	},
-	{ XFUNC_nl_next_com,		0, CTRL('O')	},
-	{ XFUNC_search_hist,		0, CTRL('R')	},
+	{ XFUNC_mv_back,		0, CCTRL('B')	},
+	{ XFUNC_mv_forw,		0, CCTRL('F')	},
+	{ XFUNC_search_char_forw,	0, CCTRL(']')	},
+	{ XFUNC_search_char_back,	1, CCTRL(']')	},
+	{ XFUNC_newline,		0, CCTRL('M')	},
+	{ XFUNC_newline,		0, CCTRL('J')	},
+	{ XFUNC_end_of_text,		0, CCTRL('_')	},
+	{ XFUNC_abort,			0, CCTRL('G')	},
+	{ XFUNC_prev_com,		0, CCTRL('P')	},
+	{ XFUNC_next_com,		0, CCTRL('N')	},
+	{ XFUNC_nl_next_com,		0, CCTRL('O')	},
+	{ XFUNC_search_hist,		0, CCTRL('R')	},
 	{ XFUNC_beg_hist,		1,	'<'	},
 	{ XFUNC_end_hist,		1,	'>'	},
 	{ XFUNC_goto_hist,		1,	'g'	},
-	{ XFUNC_mv_end,			0, CTRL('E')	},
-	{ XFUNC_mv_begin,		0, CTRL('A')	},
-	{ XFUNC_draw_line,		0, CTRL('L')	},
-	{ XFUNC_cls,			1, CTRL('L')	},
-	{ XFUNC_meta1,			0, CTRL('[')	},
-	{ XFUNC_meta2,			0, CTRL('X')	},
-	{ XFUNC_kill,			0, CTRL('K')	},
-	{ XFUNC_yank,			0, CTRL('Y')	},
+	{ XFUNC_mv_end,			0, CCTRL('E')	},
+	{ XFUNC_mv_begin,		0, CCTRL('A')	},
+	{ XFUNC_draw_line,		0, CCTRL('L')	},
+	{ XFUNC_cls,			1, CCTRL('L')	},
+	{ XFUNC_meta1,			0, CCTRL('[')	},
+	{ XFUNC_meta2,			0, CCTRL('X')	},
+	{ XFUNC_kill,			0, CCTRL('K')	},
+	{ XFUNC_yank,			0, CCTRL('Y')	},
 	{ XFUNC_meta_yank,		1,	'y'	},
-	{ XFUNC_literal,		0, CTRL('^')	},
+	{ XFUNC_literal,		0, CCTRL('^')	},
 	{ XFUNC_comment,		1,	'#'	},
-	{ XFUNC_transpose,		0, CTRL('T')	},
-	{ XFUNC_complete,		1, CTRL('[')	},
-	{ XFUNC_comp_list,		0, CTRL('I')	},
+	{ XFUNC_transpose,		0, CCTRL('T')	},
+	{ XFUNC_complete,		1, CCTRL('[')	},
+	{ XFUNC_comp_list,		0, CCTRL('I')	},
 	{ XFUNC_comp_list,		1,	'='	},
 	{ XFUNC_enumerate,		1,	'?'	},
 	{ XFUNC_expand,			1,	'*'	},
-	{ XFUNC_comp_file,		1, CTRL('X')	},
-	{ XFUNC_comp_comm,		2, CTRL('[')	},
+	{ XFUNC_comp_file,		1, CCTRL('X')	},
+	{ XFUNC_comp_comm,		2, CCTRL('[')	},
 	{ XFUNC_list_comm,		2,	'?'	},
-	{ XFUNC_list_file,		2, CTRL('Y')	},
+	{ XFUNC_list_file,		2, CCTRL('Y')	},
 	{ XFUNC_set_mark,		1,	' '	},
-	{ XFUNC_kill_region,		0, CTRL('W')	},
-	{ XFUNC_xchg_point_mark,	2, CTRL('X')	},
-	{ XFUNC_literal,		0, CTRL('V')	},
-	{ XFUNC_version,		1, CTRL('V')	},
+	{ XFUNC_kill_region,		0, CCTRL('W')	},
+	{ XFUNC_xchg_point_mark,	2, CCTRL('X')	},
+	{ XFUNC_literal,		0, CCTRL('V')	},
+	{ XFUNC_version,		1, CCTRL('V')	},
 	{ XFUNC_prev_histword,		1,	'.'	},
 	{ XFUNC_prev_histword,		1,	'_'	},
 	{ XFUNC_set_arg,		1,	'0'	},
@@ -1126,7 +1130,7 @@
 	}
 }
 
-#ifdef MKSH_SMALL
+#if defined(MKSH_SMALL) || defined(MKSH_EBCDIC)
 #define XFUNC_VALUE(f) (f)
 #else
 #define XFUNC_VALUE(f) (f & 0x7F)
@@ -2885,7 +2889,7 @@
 		} else
 			x_putc(c);
 		switch (c) {
-		case 7:
+		case MKSH_BELL_CHAR:
 			break;
 		case '\r':
 		case '\n':
@@ -2921,7 +2925,7 @@
 			x_putc(c);
 		}
 		switch (c) {
-		case 7:
+		case MKSH_BELL_CHAR:
 			break;
 		case '\r':
 		case '\n':
@@ -3976,7 +3980,7 @@
 	case '\n':
 		return (1);
 
-	case CTRL('['):
+	case CCTRL('['):
 		expanded = NONE;
 		if (first_insert) {
 			first_insert = false;
@@ -3994,19 +3998,19 @@
 			return (redo_insert(lastac - 1));
 
 	/* { Begin nonstandard vi commands */
-	case CTRL('x'):
+	case CCTRL('X'):
 		expand_word(0);
 		break;
 
-	case CTRL('f'):
+	case CCTRL('F'):
 		complete_word(0, 0);
 		break;
 
-	case CTRL('e'):
+	case CCTRL('E'):
 		print_expansions(es, 0);
 		break;
 
-	case CTRL('i'):
+	case CCTRL('I'):
 		if (Flag(FVITABCOMPLETE)) {
 			complete_word(0, 0);
 			break;
@@ -4061,8 +4065,8 @@
 		}
 		switch (*cmd) {
 
-		case CTRL('l'):
-		case CTRL('r'):
+		case CCTRL('L'):
+		case CCTRL('R'):
 			redraw_line(true);
 			break;
 
@@ -4249,7 +4253,7 @@
 
 		case 'j':
 		case '+':
-		case CTRL('n'):
+		case CCTRL('N'):
 			if (grabhist(modified, hnum + argcnt) < 0)
 				return (-1);
 			else {
@@ -4260,7 +4264,7 @@
 
 		case 'k':
 		case '-':
-		case CTRL('p'):
+		case CCTRL('P'):
 			if (grabhist(modified, hnum - argcnt) < 0)
 				return (-1);
 			else {
@@ -4492,26 +4496,26 @@
 		/* AT&T ksh */
 		case '=':
 		/* Nonstandard vi/ksh */
-		case CTRL('e'):
+		case CCTRL('E'):
 			print_expansions(es, 1);
 			break;
 
 
 		/* Nonstandard vi/ksh */
-		case CTRL('i'):
+		case CCTRL('I'):
 			if (!Flag(FVITABCOMPLETE))
 				return (-1);
 			complete_word(1, argcnt);
 			break;
 
 		/* some annoying AT&T kshs */
-		case CTRL('['):
+		case CCTRL('['):
 			if (!Flag(FVIESCCOMPLETE))
 				return (-1);
 		/* AT&T ksh */
 		case '\\':
 		/* Nonstandard vi/ksh */
-		case CTRL('f'):
+		case CCTRL('F'):
 			complete_word(1, argcnt);
 			break;
 
@@ -4519,7 +4523,7 @@
 		/* AT&T ksh */
 		case '*':
 		/* Nonstandard vi/ksh */
-		case CTRL('x'):
+		case CCTRL('X'):
 			expand_word(1);
 			break;
 
@@ -4598,7 +4602,7 @@
 		break;
 
 	case 'h':
-	case CTRL('h'):
+	case CCTRL('H'):
 		if (!sub && es->cursor == 0)
 			return (-1);
 		ncursor = es->cursor - argcnt;
Index: main.c
===================================================================
RCS file: /cvs/src/bin/mksh/main.c,v
retrieving revision 1.292
diff -u -r1.292 main.c
--- main.c	19 Apr 2015 18:51:01 -0000	1.292
+++ main.c	24 Apr 2015 02:12:25 -0000
@@ -282,6 +282,10 @@
 
 	initctypes();
 
+#ifdef MKSH_EBCDIC
+	initebcdic();
+#endif
+
 	inittraps();
 
 	coproc_init();
Index: misc.c
===================================================================
RCS file: /cvs/src/bin/mksh/misc.c,v
retrieving revision 1.226
diff -u -r1.226 misc.c
--- misc.c	20 Mar 2015 21:47:04 -0000	1.226
+++ misc.c	24 Apr 2015 02:12:25 -0000
@@ -52,7 +52,7 @@
     const unsigned char *, bool) MKSH_A_PURE;
 static int do_gmatch(const unsigned char *, const unsigned char *,
     const unsigned char *, const unsigned char *) MKSH_A_PURE;
-static const unsigned char *cclass(const unsigned char *, unsigned char)
+static const unsigned char *mksh_cclass(const unsigned char *, unsigned char)
     MKSH_A_PURE;
 #ifdef KSH_CHVT_CODE
 static void chvt(const Getopt *);
@@ -93,12 +93,8 @@
 void
 initctypes(void)
 {
-	int c;
-
-	for (c = 'a'; c <= 'z'; c++)
-		chtypes[c] |= C_ALPHA;
-	for (c = 'A'; c <= 'Z'; c++)
-		chtypes[c] |= C_ALPHA;
+	setctypes("abcdefghijklmnopqrstuvwxyz", C_ALPHA);
+	setctypes("ABCDEFGHIJKLMNOPQRSTUVWXYZ", C_ALPHA);
 	chtypes['_'] |= C_ALPHA;
 	setctypes("0123456789", C_DIGIT);
 	/* \0 added automatically */
@@ -776,7 +772,7 @@
 		}
 		switch (*p++) {
 		case '[':
-			if (sc == 0 || (p = cclass(p, sc)) == NULL)
+			if (sc == 0 || (p = mksh_cclass(p, sc)) == NULL)
 				return (0);
 			break;
 
@@ -889,7 +885,7 @@
 }
 
 static const unsigned char *
-cclass(const unsigned char *p, unsigned char sub)
+mksh_cclass(const unsigned char *p, unsigned char sub)
 {
 	unsigned char c, d;
 	bool notp, found = false;
@@ -1159,7 +1155,7 @@
 			++p;
 			switch (c) {
 			/* see unbksl() in this file for comments */
-			case 7:
+			case MKSH_BELL_CHAR:
 				c = 'a';
 				if (0)
 					/* FALLTHROUGH */
@@ -1183,11 +1179,11 @@
 				  c = 't';
 				if (0)
 					/* FALLTHROUGH */
-			case 11:
+			case MKSH_VTAB_CHAR:
 				  c = 'v';
 				if (0)
 					/* FALLTHROUGH */
-			case '\033':
+			case MKSH_ESCAPE_CHAR:
 				/* take E not e because \e is \ in *roff */
 				  c = 'E';
 				/* FALLTHROUGH */
@@ -1197,7 +1193,11 @@
 				if (0)
 					/* FALLTHROUGH */
 			default:
+#ifdef MKSH_EBCDIC
+				  if (c < 64 || c == 0xFF) {
+#else
 				  if (c < 32 || c > 0x7E) {
+#endif
 					/* FALLTHROUGH */
 			case '\'':
 					shf_fprintf(shf, "\\%03o", c);
@@ -2136,13 +2136,7 @@
 	fc = (*fg)();
 	switch (fc) {
 	case 'a':
-		/*
-		 * according to the comments in pdksh, \007 seems
-		 * to be more portable than \a (due to HP-UX cc,
-		 * Ultrix cc, old pcc, etc.) so we avoid the escape
-		 * sequence altogether in mksh and assume ASCII
-		 */
-		wc = 7;
+		wc = MKSH_BELL_CHAR;
 		break;
 	case 'b':
 		wc = '\b';
@@ -2155,7 +2149,7 @@
 		break;
 	case 'E':
 	case 'e':
-		wc = 033;
+		wc = MKSH_ESCAPE_CHAR;
 		break;
 	case 'f':
 		wc = '\f';
@@ -2170,8 +2164,7 @@
 		wc = '\t';
 		break;
 	case 'v':
-		/* assume ASCII here as well */
-		wc = 11;
+		wc = MKSH_VTAB_CHAR;
 		break;
 	case '1':
 	case '2':
@@ -2253,3 +2246,144 @@
 
 	return (wc);
 }
+
+#ifdef MKSH_EBCDIC
+
+/*
+ * The mapping of control keys to C0 control codes (e.g. Ctrl-D => EOT)
+ * is directly tied to ASCII, so for EBCDIC we have no option but to use
+ * conversion tables if we want to keep the same mapping.
+ *
+ * The mappings are generated with the following Perl code:
+ *
+--------------------------------
+#!/usr/bin/env perl
+
+use Text::Iconv;
+use strict;
+
+my $cvt = Text::Iconv->new("iso8859-1", "ibm-1047");
+
+sub fmtchr($)
+{
+	my $n = $_[0];
+	my $c = chr($n);
+	$c = "\\$c" if $c =~ /['\\]/;
+	$c = "\\177" if $n eq 0x7F;
+	return $c;
+}
+
+# generate ebcdic_ctrl_map[][] array
+for (my $i = 0x40; $i <= 0x5F; $i++)
+{
+	my $a = chr($i & 0x1f);
+	my $e = $cvt->convert($a);
+	printf "\t{ '%s', 0x%02X }, { '%s', 0x%02X }, { '%s', 0x%02X },\n",
+		fmtchr($i), ord($e),
+		fmtchr($i - 0x20), ord($e),
+		fmtchr($i + 0x20), ord($e);
+}
+
+# generate CCTRL() macro
+for (my $i = 0x41; $i <= 0x5F; $i++)
+{
+	my $a = chr($i & 0x1f);
+	my $e = $cvt->convert($a);
+	printf "\t\t\t (x) == '%s' ? 0x%02X : \\\n", fmtchr($i), ord($e);
+}
+--------------------------------
+ *
+ */
+
+static const int ebcdic_ctrl_map[][2] = {
+	/*
+	 * ebcdic_ctrl_map[x][0] = ASCII control key
+	 * ebcdic_ctrl_map[x][1] = EBCDIC control code
+	 *
+	 * Example: Ctrl-D -> 'D' -> ASCII 0x04 -> EOT -> EBCDIC 0x37
+	 *
+	 * Column 1 represents the standard ASCII control keys
+	 * Column 2 is the col. 1 set shifted minus 0x20
+	 * Column 3 is the col. 1 set shifted plus 0x20
+	 *
+	 * Note: '?' is special-cased below
+	 */
+	{ '@', 0x00 }, { ' ', 0x00 }, { '`', 0x00 },
+	{ 'A', 0x01 }, { '!', 0x01 }, { 'a', 0x01 },
+	{ 'B', 0x02 }, { '"', 0x02 }, { 'b', 0x02 },
+	{ 'C', 0x03 }, { '#', 0x03 }, { 'c', 0x03 },
+	{ 'D', 0x37 }, { '$', 0x37 }, { 'd', 0x37 },
+	{ 'E', 0x2D }, { '%', 0x2D }, { 'e', 0x2D },
+	{ 'F', 0x2E }, { '&', 0x2E }, { 'f', 0x2E },
+	{ 'G', 0x2F }, { '\'',0x2F }, { 'g', 0x2F },
+	{ 'H', 0x16 }, { '(', 0x16 }, { 'h', 0x16 },
+	{ 'I', 0x05 }, { ')', 0x05 }, { 'i', 0x05 },
+	{ 'J', 0x25 }, { '*', 0x25 }, { 'j', 0x25 },
+	{ 'K', 0x0B }, { '+', 0x0B }, { 'k', 0x0B },
+	{ 'L', 0x0C }, { ',', 0x0C }, { 'l', 0x0C },
+	{ 'M', 0x0D }, { '-', 0x0D }, { 'm', 0x0D },
+	{ 'N', 0x0E }, { '.', 0x0E }, { 'n', 0x0E },
+	{ 'O', 0x0F }, { '/', 0x0F }, { 'o', 0x0F },
+	{ 'P', 0x10 }, { '0', 0x10 }, { 'p', 0x10 },
+	{ 'Q', 0x11 }, { '1', 0x11 }, { 'q', 0x11 },
+	{ 'R', 0x12 }, { '2', 0x12 }, { 'r', 0x12 },
+	{ 'S', 0x13 }, { '3', 0x13 }, { 's', 0x13 },
+	{ 'T', 0x3C }, { '4', 0x3C }, { 't', 0x3C },
+	{ 'U', 0x3D }, { '5', 0x3D }, { 'u', 0x3D },
+	{ 'V', 0x32 }, { '6', 0x32 }, { 'v', 0x32 },
+	{ 'W', 0x26 }, { '7', 0x26 }, { 'w', 0x26 },
+	{ 'X', 0x18 }, { '8', 0x18 }, { 'x', 0x18 },
+	{ 'Y', 0x19 }, { '9', 0x19 }, { 'y', 0x19 },
+	{ 'Z', 0x3F }, { ':', 0x3F }, { 'z', 0x3F },
+	{ '[', 0x27 }, { ';', 0x27 }, { '{', 0x27 },
+	{ '\\',0x1C }, { '<', 0x1C }, { '|', 0x1C },
+	{ ']', 0x1D }, { '=', 0x1D }, { '}', 0x1D },
+	{ '^', 0x1E }, { '>', 0x1E }, { '~', 0x1E },
+	{ '_', 0x1F }, { '?', 0x1F }, { '\177', 0x1F },
+	{ '\0', 0 }
+};
+
+static int ebcdic_ctrl_table[UCHAR_MAX] = { 0 };
+static int ebcdic_unctrl_table[UCHAR_MAX] = { 0 };
+
+void
+initebcdic(void)
+{
+	int i;
+
+	for (i = 0; i < UCHAR_MAX; i++) {
+		ebcdic_ctrl_table[i] = 0;
+		ebcdic_unctrl_table[i] = 0;
+	}
+
+	for (i = 0; ebcdic_ctrl_map[i][0]; i++) {
+		int ch = ORD(ebcdic_ctrl_map[i][0]);
+		int cc = ebcdic_ctrl_map[i][1];
+		ebcdic_ctrl_table[ch] = cc;
+		if (ch % 3 == 0)
+			ebcdic_unctrl_table[cc] = ch;
+	}
+
+	/* special case */
+	ebcdic_ctrl_table[ORD('?')] = 0x07;
+	ebcdic_unctrl_table[0x07] = '?';
+}
+
+int ebcdic_ctrl(int c)
+{
+	return ebcdic_ctrl_table[ORD(c)];
+}
+
+int ebcdic_unctrl(int c)
+{
+	return ebcdic_unctrl_table[ORD(c)];
+}
+
+/*
+int ebcdic_isctrl(int c)
+{
+	return c == 0 || ebcdic_unctrl_table[ORD(c)] != 0;
+}
+*/
+
+#endif /* MKSH_EBCDIC */
Index: sh.h
===================================================================
RCS file: /cvs/src/bin/mksh/sh.h,v
retrieving revision 1.725
diff -u -r1.725 sh.h
--- sh.h	19 Apr 2015 19:18:31 -0000	1.725
+++ sh.h	24 Apr 2015 02:12:26 -0000
@@ -251,6 +251,64 @@
 
 #ifndef MKSH_INCLUDES_ONLY
 
+/*
+ * Many headaches with EBCDIC:
+ * 1. There are numerous EBCDIC variants, and it is not feasible for us 
+ *    to support them all. But we can support the EBCDIC code pages that
+ *    contain all (most?) of the characters in ASCII, and these
+ *    thankfully tend to agree on the code points assigned to the ASCII
+ *    subset. If you need a representative example, look at EBCDIC 1047,
+ *    which is first among equals in the IBM MVS development 
+ *    environment: http://en.wikipedia.org/wiki/EBCDIC_1047
+ * 2. Character ranges that are contiguous in ASCII, like the letters
+ *    in [A-Z], are broken up into segments (i.e. [A-IJ-RS-Z]), so we 
+ *    can't implement e.g. islower() as { return c >= 'a' && c <= 'z'; }
+ *    because it will also return true for a handful of extraneous
+ *    characters (like the plus-minus sign at 0x8F in EBCDIC 1047, a
+ *    little after 'i'). But at least '_' is not one of these.
+ * 3. The normal [0-9A-Za-z] characters are at codepoints beyond 0x80.
+ *    Not only do they require all 8 bits instead of 7, if chars are 
+ *    signed, they will have negative integer values! Something like
+ *    (c - 'A') could actually become (c + 63)! Use the ORD() macro to 
+ *    ensure you're getting a value in [0, 255].
+ * 4. '\n' is actually NL (0x15, U+0085) instead of LF (0x25, U+000A).
+ *    EBCDIC has a proper newline character instead of "emulating" one 
+ *    with line feeds.
+ * 5. Note that it is possible to compile programs in ASCII mode on IBM
+ *    mainframe systems, using the -qascii option to the XL C compiler.
+ *    We can determine the build mode by looking at __CHARSET_LIB:
+ *    0 == EBCDIC, 1 == ASCII
+ */
+#if defined(__MVS__) && defined(__IBMC__) && !defined(MKSH_EBCDIC)
+# if defined(__CHARSET_LIB) && __CHARSET_LIB
+#  ifndef _ENHANCED_ASCII_EXT
+#   define _ENHANCED_ASCII_EXT 0xFFFFFFFF	/* go all-out on ASCII */
+#  endif
+# else
+#  define MKSH_EBCDIC
+# endif
+#endif
+
+#ifdef MKSH_EBCDIC
+/*
+ * use symbolic escapes when possible, let the compiler sort it out
+ */
+# define MKSH_BELL_CHAR '\a'
+# define MKSH_ESCAPE_CHAR '\047'
+# define MKSH_ESCAPE_STRING "\047"
+# define MKSH_VTAB_CHAR '\v'
+#else
+/* 
+ * according to the comments in pdksh, \007 seems to be more portable
+ * than \a (due to HP-UX cc, Ultrix cc, old pcc, etc.) so we avoid the
+ * latter altogether when we're using ASCII
+ */
+# define MKSH_BELL_CHAR '\007'
+# define MKSH_ESCAPE_CHAR '\033'
+# define MKSH_ESCAPE_STRING "\033"
+# define MKSH_VTAB_CHAR '\013'
+#endif
+
 /* extra types */
 
 #if !HAVE_GETRUSAGE
@@ -298,13 +356,24 @@
 	} while (/* CONSTCOND */ 0)
 #endif
 
-#define ksh_isdigit(c)	(((c) >= '0') && ((c) <= '9'))
-#define ksh_islower(c)	(((c) >= 'a') && ((c) <= 'z'))
-#define ksh_isupper(c)	(((c) >= 'A') && ((c) <= 'Z'))
-#define ksh_tolower(c)	(((c) >= 'A') && ((c) <= 'Z') ? (c) - 'A' + 'a' : (c))
-#define ksh_toupper(c)	(((c) >= 'a') && ((c) <= 'z') ? (c) - 'a' + 'A' : (c))
+#define ORD(c)		((int)(unsigned char)(c))
+#ifdef MKSH_EBCDIC
+# define ksh_isdigit(c) (ORD(c) >= ORD('0') && ORD(c) <= ORD('9'))
+# define ksh_islower(c)	(ORD(c) >= ORD('a') && ORD(c) <= ORD('z') && \
+			 ksh_isalphx((c)))
+# define ksh_isupper(c)	(ORD(c) >= ORD('A') && ORD(c) <= ORD('Z') && \
+			 ksh_isalphx((c)))
+# define ksh_isspace(c)	((c) == '\t' || (c) == '\n' || (c) == '\v' || \
+			 (c) == '\f' || (c) == '\r' || (c) == ' ')
+#else
+# define ksh_isdigit(c)	(((c) >= '0') && ((c) <= '9'))
+# define ksh_islower(c)	(((c) >= 'a') && ((c) <= 'z'))
+# define ksh_isupper(c)	(((c) >= 'A') && ((c) <= 'Z'))
+# define ksh_isspace(c)	((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20))
+#endif
+#define ksh_tolower(c)	(ksh_isupper((c)) ? (c) - 'A' + 'a' : (c))
+#define ksh_toupper(c)	(ksh_islower((c)) ? (c) - 'a' + 'A' : (c))
 #define ksh_isdash(s)	(((s)[0] == '-') && ((s)[1] == '\0'))
-#define ksh_isspace(c)	((((c) >= 0x09) && ((c) <= 0x0D)) || ((c) == 0x20))
 #define ksh_min(x,y)	((x) < (y) ? (x) : (y))
 #define ksh_max(x,y)	((x) > (y) ? (x) : (y))
 
@@ -460,7 +529,7 @@
  * not a char that is used often. Also, can't use the high bit as it causes
  * portability problems (calling strchr(x, 0x80 | 'x') is error prone).
  */
-#define MAGIC		(7)	/* prefix for *?[!{,} during expand */
+#define MAGIC		(MKSH_BELL_CHAR) /* prefix for *?[!{,} during expand */
 #define ISMAGIC(c)	((unsigned char)(c) == MAGIC)
 
 EXTERN const char *safe_prompt; /* safe prompt if PS1 substitution fails */
@@ -1583,9 +1652,49 @@
 #define HERES		10	/* max number of << in line */
 
 #undef CTRL
-#define	CTRL(x)		((x) == '?' ? 0x7F : (x) & 0x1F)	/* ASCII */
-#define	UNCTRL(x)	((x) ^ 0x40)				/* ASCII */
-#define	ISCTRL(x)	(((signed char)((uint8_t)(x) + 1)) < 33)
+#ifdef MKSH_EBCDIC
+# define CTRL(x)	(ebcdic_ctrl(x))
+# define CCTRL(x)	((x) == '?' ? 0x07 : /* special case */ \
+			 (x) == 'A' ? 0x01 : \
+			 (x) == 'B' ? 0x02 : \
+			 (x) == 'C' ? 0x03 : \
+			 (x) == 'D' ? 0x37 : \
+			 (x) == 'E' ? 0x2D : \
+			 (x) == 'F' ? 0x2E : \
+			 (x) == 'G' ? 0x2F : \
+			 (x) == 'H' ? 0x16 : \
+			 (x) == 'I' ? 0x05 : \
+			 (x) == 'J' ? 0x25 : \
+			 (x) == 'K' ? 0x0B : \
+			 (x) == 'L' ? 0x0C : \
+			 (x) == 'M' ? 0x0D : \
+			 (x) == 'N' ? 0x0E : \
+			 (x) == 'O' ? 0x0F : \
+			 (x) == 'P' ? 0x10 : \
+			 (x) == 'Q' ? 0x11 : \
+			 (x) == 'R' ? 0x12 : \
+			 (x) == 'S' ? 0x13 : \
+			 (x) == 'T' ? 0x3C : \
+			 (x) == 'U' ? 0x3D : \
+			 (x) == 'V' ? 0x32 : \
+			 (x) == 'W' ? 0x26 : \
+			 (x) == 'X' ? 0x18 : \
+			 (x) == 'Y' ? 0x19 : \
+			 (x) == 'Z' ? 0x3F : \
+			 (x) == '[' ? 0x27 : \
+			 (x) == '\\'? 0x1C : \
+			 (x) == ']' ? 0x1D : \
+			 (x) == '^' ? 0x1E : \
+			 (x) == '_' ? 0x1F : \
+			 0)
+# define UNCTRL(x)	(ebcdic_unctrl(x))
+# define ISCTRL(x)	(ORD(x) < 64 || ORD(x) == 0xFF)
+#else
+# define CTRL(x)	((x) == '?' ? 0x7F : (x) & 0x1F)
+# define CCTRL(x)	CTRL(x)
+# define UNCTRL(x)	((x) ^ 0x40)
+# define ISCTRL(x)	(((signed char)((uint8_t)(x) + 1)) < 33)
+#endif
 
 #define IDENT		64
 
@@ -1884,6 +1993,12 @@
 char *strndup_i(const char *, size_t, Area *);
 #endif
 int unbksl(bool, int (*)(void), void (*)(int));
+#ifdef MKSH_EBCDIC
+void initebcdic(void);
+int ebcdic_ctrl(int);
+int ebcdic_unctrl(int);
+int ebcdic_isctrl(int);
+#endif
 /* shf.c */
 struct shf *shf_open(const char *, int, int, int);
 struct shf *shf_fdopen(int, int, struct shf *);
Index: var.c
===================================================================
RCS file: /cvs/src/bin/mksh/var.c,v
retrieving revision 1.190
diff -u -r1.190 var.c
--- var.c	19 Apr 2015 18:51:02 -0000	1.190
+++ var.c	24 Apr 2015 02:12:26 -0000
@@ -510,7 +510,7 @@
 	}
 
 	if (c == '0' && arith) {
-		if ((s[0] | 0x20) == 'x') {
+		if (s[0] == 'x' || s[0] == 'X') {
 			/* interpret as hexadecimal */
 			base = 16;
 			++s;
@@ -553,12 +553,12 @@
 			continue;
 		}
 		if (ksh_isdigit(c))
-			c -= '0';
+			c -= ORD('0');
 		else {
-			c |= 0x20;
+			c = ksh_tolower(c);
 			if (!ksh_islower(c))
 				return (-1);
-			c -= 'a' - 10;
+			c -= ORD('a') - 10;
 		}
 		if (c >= base)
 			return (-1);

build.txt.gz
Description: application/gzip

test.txt.gz
Description: application/gzip

[PATCH] IBM z/OS + EBCDIC support

Reply via email to